Fast and Conspicuous? Quantifying Salience With the Theory of Visual Attention

Alexander Krüger; Jan Tünnermann; Ingrid Scharlau

doi:10.5709/acp-0184-1

. 2016 Mar 31;12(1):20–38. doi: 10.5709/acp-0184-1

Fast and Conspicuous? Quantifying Salience With the Theory of Visual Attention

Alexander Krüger ¹, Jan Tünnermann ¹, Ingrid Scharlau ¹

PMCID: PMC4862317 PMID: 27168868

Abstract

Particular differences between an object and its surrounding cause salience, guide attention, and improve performance in various tasks. While much research has been dedicated to identifying which feature dimensions contribute to salience, much less regard has been paid to the quantitative strength of the salience caused by feature differences. Only a few studies systematically related salience effects to a common salience measure, and they are partly outdated in the light of new findings on the time course of salience effects. We propose Bundesen’s Theory of Visual Attention (TVA) as a theoretical basis for measuring salience and introduce an empirical and modeling approach to link this theory to data retrieved from temporal-order judgments. With this procedure, TVA becomes applicable to a broad range of salience-related stimulus material. Three experiments with orientation pop-out displays demonstrate the feasibility of the method. A 4th experiment substantiates its applicability to the luminance dimension.

Keywords: salience, visual attention, Bayesian inference, theory of visual attention, computational modeling

Introduction

As early as 1890, William James (1890, p.416) described a kind of attention caused by “an instinctive stimulus, a perception which, by reason of its nature rather than its mere force, appeals to one of our normal congenital impulses”.

Though over a century old and in an uncommon wording, the quote expresses the idea that some objects trigger basic attentional mechanisms that all humans share. These mechanisms are feature-specific instead of being based on sensory strength. This description fits the current idea of stimulus-driven or bottom-up attention. For both James’ description and the modern perspective, however, there remains the question which features attract such attention. Among James’ rather uncommon examples are strange things, moving things, bright things, and metallic things. From today’s knowledge, we would argue that it is not simply the properties, but the context in which the object occurs which are of great importance. This relation is captured by the term salience (among others) which describes a local feature difference that attracts attention. Thus, a bright stimulus among other bright stimuli would not attract much attention, and neither would an object moving in the same direction and with the same speed as other moving objects.

James’ (1890) initial question which features are essential for guiding attention has been extensively studied within visual attention research (for a summary see Wolfe & Horowitz, 2004). However, much less research has addressed the strength of salience dimensions and their quantitative influence on attention, which is the focus of the present article. If you want to be seen, would it be better to be moving, or to be bright—or even metallic?

There are several, mostly model-based approaches to answer this question.

Early visual processing is based on the receptive fields of neurons tuned to particular features (e.g., Hubel & Wiesel, 1959, 1968), which are the source of bottom-up influences on perception and attention (for a review see Treue, 2003). The strength of these neurophysiological responses depends on the strength of the presented features (Zhang, Zhaoping, Zhou, & Fang, 2012). This strength and combinations of features of varying strength have predominantly been tackled using methods from engineering (e.g., Itti & Koch, 2001b; Zhao & Koch, 2013).

Computational modeling approaches allow to simulate retinotopic salience maps for natural input images (for a review see Frintrop, Rome, & Christensen, 2010). Different mathematical strategies have been explored to compute a salience value for every location in the image. Because of the difficulties of solving these problems algorithmically, machine learning techniques have been employed (Itti & Koch, 2001b; Zhao & Koch, 2013). Although such approaches may be applied in computer vision, it is unclear if they correspond to salience in human attention. For instance, many computational models such as that by Itti and Koch (2001a) predict that a higher luminance contrast attracts more attention. Einhäuser and König (2003) experimentally manipulated the luminance contrast of images. The participants in their study had to carefully study natural and modified natural images. The correlation of luminance contrast and fixation probability, however, failed to confirm the model prediction.

The neurophysiological salience model by Li (2002) makes quantitative predictions about human performance in salience related tasks. Li assumes that the strength of salience is represented implicitly by the firing rate of retinotopic neurons in V1 that encode specific features or combinations of features. This model accounts qualitatively for a wide range of empirical findings like search asymmetries in visual search (e.g., Li, 1999). It simulates the neurophysiological processing of the visual information by a complex recurrent artificial neuronal network (Li, 2001). The firing rate of these artificial neurons can hence be regarded as a quantitative prediction. However, the model cannot yet account quantitatively for experimental data.

Another model focusing on salience-related human performance is the fourth version of the Guided Search model by Wolfe (2007). In this model, salience is handled by a module for the bottom-up guidance of attention. This guidance is modeled by individual channels tuned to specific features (e.g., steep, shallow, left, and right for orientation). It contains a simple mathematical function for the contribution of each orientation channel. Salience itself is then computed by pairwise comparisons of these values for all visible objects. Wolfe states that the precise shape of the function that determines the contribution of a channel to overall salience is not critical for the qualitative performance of the model. This statement makes it questionable whether the model may provide good quantitative predictions on this level although it qualitatively accounts for a wide range of empirical findings on visual search. As Wolfe himself concedes, not all quantitative aspects of human behavior in terms of response times and errors can be successfully predicted. In conclusion, models do not yet provide a general explanation of the quantitative strength of salience.

Some attempts to establish a quantitative measure of salience are based on the analysis of behavioral data. Among the few studies in this line of research are those by Nothdurft (1993, 2000). He asked participants to compare the conspicuousness of two singletons that are unique elements embedded in a display of homogeneous background elements. Each stimulus whose salience was to be measured was presented with a stimulus that was salient due to a luminance difference. To measure the salience of a stimulus, the salience of the reference (luminance) stimulus was systematically increased. By this means, Nothdurft (2000) related the feature dimensions motion, orientation, luminance, and color to each other and also compared combinations of features from different dimensions. He quantified salience by relating a salient stimulus to the luminance difference that would create the same salience via approximation of psychometric functions and calculating what one might call the point of subjective equal salience. This approach comes close to a general and theoretically well-founded quantification. Unfortunately, the results are difficult to replicate. While we could replicate Nothdurft’s findings using orientation and luminance, we also found that many participants showed no regular psychometric functions but rather a behaviour strongly influenced by guessing (unpublished pilot study). Similar difficulties were reported by Koene and Zhaoping (2007).

Starting from this need for a better behavioral method to quantify salience, Huang and Pashler (2005) came up with a search task for the biggest and brightest square in a display of several objects. The location of a small probe on its left or right side had to be reported to verify that the target was found. The dependent variable was the response time for a correct report. The display was randomly filled with other distractor squares. Salience was measured in these trials by introducing a salient key distractor. Its salience was quantified by examining the effect of the feature differences on response times. Via this quantification, Huang and Pashler related luminance and size to each other.

An additional aspect impeding the measurement of salience is its time course. Regarding the time course, several different ideas were discussed (e.g., Egeth & Yantis, 1997), with two types of temporal dynamics being especially important for the study of salience. (1) Salience-based progression of attention (e.g., Koch & Ullman, 1985) describes the shift of attention from the most salient spot in an image to the second most salient spot and so forth. (2) Time course of salience describes how the strength of salience effects varies over time. Salience effects increase from display onset to 100 or 150 ms (e.g., Couffe, Mizzi, & Michael, 2016; Kean & Lambert, 2003) and decay after approximately 300 ms. Evidence for this time course—which resembles the time course of attention (Olivers, 2007)—comes from a variety of different paradigms: probe detection (Dombrowe, Olivers, & Donk, 2010; Donk & Soesman, 2010), TOJs (Donk & Soesman, 2011), saccadic selection (Donk & van Zoest, 2008), and saccadic trajectories (van Zoest, Donk, & Van der Stigchel, 2012). This research implies that it is crucial to measure salience at specific points in time (a condition not met by Huang & Pashler, 2005).

The approaches discussed above consider or measure performance as an indicator of attention. They spend less effort on the quantification of salience itself. An approach that might provide such a quantification is Bundesen’s Theory of Visual Attention (TVA; Bundesen, 1998). It comprises a psychologically inspired, general formal explanation of visual attention and selection processes and allows to infer attentional weights for specific objects in a display. The attentional weight determines if an object is encoded in visual short-term memory (VSTM)—and if so, how quickly it is encoded—that is, its processing speed. These parameters can possibly be used as a general quantification of salience in the sense that the strength of salience is the attentional weight of an object.

Although promising on an abstract level, TVA has only rarely been used to investigate salience (e.g., Nordfang, Dyrholm, & Bundesen, 2013). A possible reason is that in the item-report paradigms commonly used with TVA, the potential stimulus material is restricted to highly overlearned categories like digits and letters. The experimental paradigm requires a categorization because probabilities of stimulus categorizations are estimated. Hence, TVA is not directly applicable to salience research.

Recently, however, Tünnermann, Petersen, and Scharlau (2015) paved the way for such an application. Originally, they investigated whether the relatively faster perception of an attended stimulus in a pair is caused by speeded processing of this attended stimulus or decelerated processing of its unattended counterpart. Along with TVA-based item report, participants judged the temporal order (temporal-order judgment; TOJ) in which the stimuli appeared. Tünnermann et al. found that the attentional benefit originates from a combination of speeding up the attended and slowing down the unattended stimulus. This conclusion is based on a conventional TVA analysis. In the Discussion, however, they sketched a new approach. They suggested that data from TOJ might be directly modeled by TVA to obtain TVA’s attention parameters. At first sight, this might not seem ground-breaking, but the proposed method offers applying TVA-based analysis to any kind of stimulus. The aim of the present paper is to test the feasibility of this approach.

In a nutshell—details will be explained below in two sections on TVA and modeling of TOJ data—the method consists of having observers judge the temporal order of two arbitrary visual stimuli. The interval between the stimuli is varied over trials. Application of TVA to the observers’ judgments allows computing of processing speed, attentional weights, and overall attentional processing capacity. By manipulating the features of the stimulus, this method allows us to quantify salience in the form of these parameters. This approach can provide a theoretically well-founded, general quantification of salience.

The Theory of Visual Attention (TVA)

The present section provides a short summary of the relevant parts of TVA as a formal theory. Key terms for the modeling as well as the experiments are introduced, most importantly attentional weight and processing capacity. The section can, however, not provide a full introduction to TVA, for which we refer the interested reader to sources such as those by Bundesen (1998) and Bundesen, Habekost, and Kyllingsbæk (2005).

TVA was introduced as a unified theory of visual recognition and attentional selection. The theory achieves this by mathematically formalizing the processes associated with the processing of visual objects from presentation towards encoding in VSTM. This processing is described as a race for representation in one of the limited slots in VSTM. Stimuli race independently and in parallel. The race is influenced by many factors. Among them are the total number of elements competing for representation, the distribution of attention across the stimuli, and the categories to which the stimuli potentially belong.

In order to explain the formalization of this process, we proceed backwards from the arrival in VSTM to the appearance of the stimuli.

TVA assumes that the arrival times of stimuli in VSTM are exponentially distributed. Although the theory is fleshed out for multiple stimuli, the present approach is a simpler case: In the derivation proposed by Tünnermann et al. (2015) on the basis of TOJs, only two targets are encoded. Thus, the VSTM limitation can be ignored, which simplifies formalization. Back to the event of encoding an object to VSTM, the probability of an object x to be encoded before time t can then be expressed as the probability density function:

F (t) = {\begin{matrix} 1 - e^{- v_{x} (t - t_{0})} if t > t_{0} \\ 0, else \end{matrix}

(1)

The two cases that are distinguished in the equation emerge from the assumption that there is a maximal ineffective exposure duration t₀. This is the interval—that is still too short to provide enough sensory evidence for the race to start at all. If t ≤ t₀, there is no chance that the processing of x finishes, whereas for t > t₀ there is a chance that processing has been completed. This probability depends on the exposure duration and the processing rate υ_x. This rate’s unit corresponds to categorizations per second, and it is composed of:

v_{x} = \sum_{i \in R} v (x, i)

(2)

The equation is based on the idea that different categorizations are possible for object x. The set R represents this set of categories and the processing rate υ(x,i) with expressing the speed of the particular categorization that x belongs to category _i. This i can, for example, refer to the property of having a particular color or a certain orientation.

Descending deeper into the formalization, the processing rate is defined as:

v (x, i) = η (x, i) β_{i} \frac{w_{x}}{\sum_{z \in S} w_{z}}

(3)

This equation introduces three important factors that are η(x,i), the strength of the sensory evidence that x belongs to category i, β_i, a decision bias for category i, and the relative attentional weight for x given by its own weight ω_x divided by the weights for all objects in the visual field. All objects in the visual field are contained in the set S. The weights are defined by the weight equation:

w_{x} = \sum_{j \in R} η (x, j) π_{j}

(4)

which again includes the sensory evidence for x as η(x,j) and a new variable Π_j, which is a selection bias for category j, the pertinence value. These are summed over the set of all categories R.

The present approach concentrates on the parameters attentional weight ω, processing speed υ, and overall processing capacity C. The processing speed describes how quickly a representation in VSTM is built up. The sum of all the processing speed available is the processing capacity. The attentional weight corresponds to the relative advantage of a stimulus and expresses how much attention is allocated to this object in comparison to the others. (The biases Π and β are both held constant in the context of the present experiments and are hence not estimated.)

Based on this admittedly swift introduction of the formalization the reader may deem TVA too cumbersome for dealing with comparably simple salience displays. This formalization, however, offers advantages. Firstly, TVA allows precise quantification and provides psychologically meaningful parameters, such as processing speed, which can be applied to a broad range of perceptual and attentional phenomena. Secondly, salience research can be related to other phenomena that have already been studied with TVA, such as, for example, feature-difference (bottom-up) and feature-relevance (top-down) interactions (Nordfang et al., 2013). Finally, because of its precise quantitative nature, the TVA framework can be used for generating quantitative hypotheses.

Modeling TOJ Data by TVA

TVA was initially applied to multi-element displays of highly overlearned stimuli, such as letters or numbers from which all or several belonging to a certain category had to be reported. The stimuli have to be masked to derive the assumed performance. Both features—highly overlearned and maskable stimuli—have so far restricted the general applicability of TVA. As already mentioned, Tünnermann et al. (2015) discussed a TOJ model derived from TVA equations which renders TVA applicable to all kinds of visual stimuli and also does away with the necessity of masking. They did so by introducing a temporal-order task and relating the psychometric functions derived from this task mathematically to the distributions assumed by TVA. In the following section, we will explain briefly how TOJ data can be modeled with TVA. For more detail, we refer the reader to the original article.

In the TOJ paradigm, the temporal order of two onsets has to be judged. We call these two targets T_probe and T_reference. In the experiments presented later, they will have different properties according to the experimental variable, but at present these names are just used to make them distinguishable. They appear with a variable interval between them. The dependent variable is the amount of judgments for T_probe. If T_probe precedes T_reference with a large interval, judgments in favor of T_probe will be frequent. If the other stimulus leads, the proportion of judgments for T_probe will be low. If T_probe and T_reference are comparable, and the two stimuli are presented simultaneously, the participants’ performance should reach chance level.

However, subjective perception can deviate markedly from objective events. Such judgments can, for example, be systematically influenced by attention. If one of the stimuli is attended-to in advance, this stimulus will be perceived earlier. This phenomenon is called prior entry (Spence & Parise, 2010). In terms of the judgments, this effect becomes evident in an increased proportion of reporting the attended stimulus as being perceived first.

TOJ data can be fitted with psychometric functions. Possible mathematical descriptions of psychometric functions include the cumulative distribution of the normal distribution, logistic, Weibull, and Gumbel functions, of which the former two are most widely employed (for more formal descriptions and how to fit these functions see Kuss, Jäkel, & Wichmann, 2005; Wichmann & Hill, 2001a, 2001b). These functions have at least two parameters, the most important of which describe the center of the function and its slope. The center, at which both judgments are equally likely, is usually interpreted as the point of subjective simultaneity (though see Weiß & Scharlau, 2011). The slope is an indicator of discrimination performance. Importantly, it is a matter of debate which of the functions mentioned above should be used because none of them is particularly supported by theory. Hence, also the interpretation of the functions and their parameters is limited.

In contrast to psychometric functions, TVA offers parameters deeply rooted in psychological theory. As an additional advantage, they can also be interpreted readily. For instance, the parameter v corresponds to processing speed. Its unit is stimuli processed per second. This model carries more information than the point of subjective simultaneity and discrimination performance which measure only performance, not the processes that drive this performance.

Each data point of a psychometric function is equivalent to the proportion of one event being encoded first. This connection is illustrated in Figure 1 for the judgment of a salient and a non-salient stimulus (the main conditions in the experiments reported below). Each of the points, sampled from the psychometric function, depends on the process depicted above the function: According to the TVA-based model, each of the two bars represents a race to VSTM. The results of these two races are compared which determines the participant’s judgment. Each race is influenced by the objective onset and its speed. The process is, however, still a stochastic process—that is, these variables do not fully determine the outcome.

Figure 1. — Cognitive model. The bars in the upper part represent the races to VSTM. Formally, these races depend on the processing rates. The rates υ_sp and υ_sr from the salience condition of the experiments are shown exemplarily. The proportion of “salient first” judgments depends on the comparison of both races. SOA = Stimulus Onset Asynchrony.

As proposed by Tünnermann et al. (2015) the chance of onset T_probe being encoded first can be described with the parameters of TVA. It can be expressed by three parameters which include υ_p (the processing speed of T_probe), υ_r (the processing speed of T_reference), and Δt which incorporates the SOA and the maximal ineffective exposure duration as Δt = SOA + t₀^p − t₀^r, where t₀^p and t₀^r are the maximal ineffective exposure durations for the two stimuli. They are assumed to be equal in the context of the present experiments.

In terms of these parameters, the probability of T_probe being encoded first can be expressed as:

P_{p} (v_{p}, v_{r}, Δ t) = 1 - e^{v_{p} ∣ Δ t ∣} + e^{v_{p} ∣ Δ t ∣} (\frac{v_{p}}{v_{p} + v_{r}}) for Δ t < 0

(5)

where 1-e^-v_p|Δt| describes the probability that T_probe is fully encoded before T_reference starts the race to VSTM. The probability e^v_p|Δt| is the probability of the event that T_probe is not encoded before T_reference starts its race. Then the probability of encoding T_probe first is given by Luce’s choice axiom

(\frac{v_{p}}{v_{p} + v_{r}}) = \int_{0}^{\infty} v_{p} e^{- v_{p} t} \cdot e^{- v_{r} t} dt

. For Δt ≥ 0 it holds that:

P_{p} (v_{p}, v_{r}, Δ t) = e^{v_{r} ∣ Δ t ∣} (\frac{v_{p}}{v_{p} + v_{r}}) for Δ t \geq 0

(6)

Here, analogously e^v_r|Δt|, denotes the probability that T_reference is not encoded before T_probe starts its race. If this happens, the probability of T_probe being encoded first is given by Luce’s choice axiom.

To estimate the TVA parameters introduced in this section, a suitable statistical modeling is needed. We use Bayesian statistics for modeling and data analysis because Bayesian methods are particularly well-suited for inference under an assumed model (Little, 2006). We implemented a generative model based on the mathematical description of TVA, visualized in the hierarchical graphical Bayesian model of Figure 2. Table 1 shows how the variables (nodes) are formally defined. The graphical model describes the relation between the raw data and the TVA parameters on the group level. As an intermediate step, the TVA parameters are estimated per participant. The graphical model depicted in Figure 2 belongs to one group or condition in an experiment. Each further condition is modeled analogously. If there are at least two groups, their group parameters represented at the very top can be compared. On the group level, the mean of attentional weight is represented by node ω_sp
m. Because of technical reasons the variance of the estimated attentional weight is represented as a separate variable node ω_sp
τ. Similarly, the capacity mean and variance are represented by the upper two C nodes. Additionally, we can infer the group-level processing speed for both targets as represented by the upper υ nodes. However, they do not provide additional information because they depend on the weight and capacity, as indicated by the direction of the arrows. For further information on the exact nature of the Bayesian parameter estimation process, please refer to Appendix A.

Table 1. Variables of the Hierarchical Bayesian Graphical Model (See Figure 2).

Variable	Explanation
ω_npj∼Normal(ω_npm,ω_npτ)	Attentinal weight (probe)
ω_nrj=1-ω_npj	Attentinal weight (reference)
ν_np=mean(ν_npj) j∈participants	Processing rate (probe)
ν_nr=mean(ν_nrj) j∈participants	Processing rate (reference)
C_nj∼Normal(C_nm,C_nτ)	Processing capacity
ν_npj=C_nj·ω_npj	Participant processing rate (probe)
ν_nrj=C_nj·ω_nrj	Participant processing rate (reference)
θ_sj,i←P_A(ν_np, ν_nrp, SOA)	Probability of “Probe first”
y_nj,i=Binominal(θ_nj,in_sj,i)	Count “Probe first” response

Open in a new tab

The following four experiments test the viability of the proposed method in salience research. To this end, we combined TOJs with salience displays. In Experiment 1, the order of stimulus onsets had to be judged. This experiment was most similar to common TOJ experiments. In Experiment 2, stimulus offsets were judged, and the stimuli of Experiment 3 flickered for a short duration. We investigated whether salience increased processing speed and attentional weights. Finally, Experiment 4 was conducted to show the applicability to the luminance dimension as well as the sensitivity of the method.

Experiment 1

Experiment 1 is based on the hypothesis that the onset of an orientation singleton achieves an increased attentional weight and is hence encoded to VSTM more quickly. It was carried out as a proof of concept to show that TVA can be successfully applied to salience research via the general TOJ method outlined by Tünnermann et al. (2015). To this end, it had to meet the requirements of both salience studies and TOJ research, requiring us to combine multi-element displays from salience research with temporally distributed targets in the most direct way possible.

The participants judged the temporal order in which two targets appeared in a display of 17 × 17 bars. A center section of these displays is exemplarily shown in Figure 3. The salience display consisting of homogeneous background stimuli was shown first. The targets appeared later. One of the targets could differ in orientation whereas the other one was always non-salient—that is, of the same orientation as the background elements.

Figure 3. — Visualization of the stimulus sequence of Experiment 1 to 4. Stimuli are identical to those of the experiments, but displays have been scaled for visibility. The salience display was shown 150 ms before the probe event. The event to be judged was the onset (Experiment 1), offset (Experiment 2), or flicker (Experiment 3 and 4; depicted as white coronae). Only the salience conditions are shown. These conditions comprise a salient probe stimulus. The neutral conditions of the experiments featured a non-salient probe stimulus equal to the reference stimulus. These conditions are not depicted. The arrow depicts the flow of time. SOA = Stimulus Onset Asynchrony.

This combination of multi-element displays and stimulus onsets is the direct way of checking the applicability of the method. Unfortunately, however, it is questionable whether target onsets allow salience effects to show up. Firstly, the blanks at the locations of the future targets may act as salient stimuli because they violate the background pattern (Li, 2002). Secondly, results on the temporal course of salience suggest that salience is used to gradually distribute attention over the display (Dombrowe et al., 2010): After a 30 ms delay, the salience effect is very small in comparison to its peak at 120 ms. Salience information thus might not be available initially. Finally, the onset information may be so strong that it masks any effects of salience. Because the present experiment serves as a proof of concept, this is no severe disadvantage. If the methodology works as expected, we will be able to precisely describe the reported temporal order with the help of the proposed model independent of whether an effect of salience is present on the group level. Following this proof of concept, Experiments 2 and 3 will look into effects of salience themselves.

Method

Participants

A total of 20 students at Leuphana University of Lüneburg (5 male and 15 female; M_age = 23.9 years, range 20-33) participated in Experiment 1. Seven participants took part in an additional session and one participant in three sessions. Within Bayes methodology, such variation can be taken into account in the parameter estimation for the individual participants which improves precision. The higher precision on the individual level also affects the parameter estimation on the group level. All participants reported normal or corrected-to-normal visual acuity and received a payment of 8 Euro per hour.

Apparatus

The experiment was conducted in a dimly lit experimental booth.

A Windows 7 computer with a dedicated graphic card and an Iiyama Vision Master Pro512 22 inches (40.4 cm × 30.3 cm) CRT monitor was used for stimulus presentation. The refresh rate was set to 100 Hz, the resolution to 1,024 × 768 pixels with 32-bit colors. The vsync signal was used for timing the experiment. The experiment was programmed using PsychoPy (Peirce, 2007). The distance to the screen was 50 cm. Participants responded with the hand corresponding to the location that had to be reported. The control key on the bottom left and the enter key on the bottom right corner of the keyboard were used for responses.

Stimuli

Each trial started with a fixation cross in the center of the screen. After a delay of 900 ms, the participants saw a 17 × 17 array of bars. The array corresponded to 34.99° × 34.99° of visual angle. Bar length was 1.07° of visual angle and width 0.18°. The fixation cross occupied the middle of the array. The background color of the screen was set to gray, RGB (96, 96, 96) equivalent to 6.98 cd/m² , while bars and fixation cross were white, RGB (224, 224, 224) equivalent to 66.2 cd/m² . Each bar stimulus belonged to one of three logical categories which were not necessarily visually distinguishable. These categories are background elements, target T_reference and target T_probe. While the background elements and T_reference were always homogeneously oriented, the orientation of T_probe varied between a 0° difference to the background in the neutral condition and the maximal orientation contrast of 90° in the salience condition. The orientation of the non-salient elements was chosen randomly for each trial. The targets were presented at fixed positions on the left and right of the fixation cross with an eccentricity of 8.24° of visual angle. Both positions were empty when the array was initially presented. T_probe was always presented 150 ms after the onset of the array of background elements. This duration was not jittered because salience effects decay over time as reported by, for example, Donk and van Zoest (2008), and the TOJ required a temporal window of -100 ms to +100 ms around this value. T_reference was shown with an SOA of -100, -80, -60, -40, -20, 0, 20, 40, 60, 80 and 1 ms, respectively. After a display duration of 300 ms, all bars vanished. The number of trials varied with the SOA because the variance is expected to increase towards the 0 SOA. Twenty-four trials were present for each of the -100, -80, 80, and 100 ms SOA, 32 trials for the -60, -40, 40 and 60 ms SOA, and 48 trials for the -20, 0, and 20 ms SOA. The participants had to respond via a keystroke with either the left ctrl or the right enter key. The side at which T_probe appeared was chosen randomly. The next trial started automatically with a delay of 1 s with a 100 ms jitter.

Procedure

Participants were instructed to fixate the cross in the center of the screen throughout each trial. Their task was to report which element occurred first, the left or the right one, and press the left or right key, respectively. There was no time pressure. The experiment started with a training phase of 40 trials that included feedback about errors. There was no feedback after the training. After 50 trials each, a break was initiated which was ended by a keypress. The experiment lasted approximately 45 min.

Results

The judgments whether the left or right stimulus appeared first were converted into the judgment whether T_probe appeared first. Remember that T_probe is the stimulus that stands out from its surroundings in the salience condition.

As can be seen in Figure 4, the participants generated typical sigmoid TOJ data. All individual data showed this pattern which allowed us to apply the model (see the section “Modeling TOJ data by TVA” for details).

Bayesian statistics yields a full probability distribution of the model parameters, a point estimate of the parameter, which is provided by the mode of the respective distribution, and an easily interpretable measure of the certainty with which the parameter was estimated. Broad probability distributions correspond to vague estimates. This information is expressed by the highest density interval (HDI) of the distribution, the interval on the x-axis in which 95% of the likely parameters lie.

The most interesting variables in the hierarchical Bayesian graphical model are on the group level because they allow us to compare the difference between the salience and neutral condition. The relation between the weight for T_probe in the salience condition ω_sp and its counterpart in the neutral condition ω_np shows if salience has an influence on attention parameters (see Figure 5). The parameter distribution for the weights are depicted in Figure 5. The parameter estimations show that w_sp = .507 and ω_np = .516 differ only slightly. Interestingly, the value .5 is not among the 95% of the most probable parameters for ω_np—that is, attention is not distributed equally across the two targets in the neutral condition. Because all elements were equally salient in this condition, visual properties cannot be the cause of the higher attentional weight for T_probe. The temporal properties, however, offer an explanation: T_probe was always shown 150 ms after display onset. This fixed interval made it predictable. In order to measure the effect of salience unbiased by that of temporal expectation, we subtracted the deviation from the expected neutral weight .5 in the ω_np parameter from the ω_sp parameter. The corrected weight is ω_sp _clean = .493. The correction shifts the weight of the salience condition ω_sp in the opposite of the expected direction, which would be an increased weight for the salient stimulus. As explained earlier, the effect is small and hence again, ω_np and ω_sp _clean differed only slightly.

Figure 5. — Estimated attentional weights (ω) for the probe stimuli of Experiment 1, salience condition (ω_sp = weight for the salient probe) in blue and neutral (ω_sp = weight for the neutral probe) in red. The weights for the reference stimuli are 1 minus the weight of the respective probe.

The processing rates for the stimuli are very similar. All are in the range of 23.3 Hz to 24.9 Hz. This result is to be expected when both weights and capacities are similar (see Figure 6).

Figure 6. — Estimated processing rates (υ) for Experiment 1. The processing rates of the salience condition (υ_sp = processing rate for the salient probe; υ_sr = processing rate for the reference in the salient probe displays) are shown in blue, those of the neutral condition (υ_np = processing rate for the neutral probe; υ_nr = processing rate for the reference in the neutral probe displays) in red. The darker distributions belong to the probe stimulus and the lighter distributions belong to the reference stimulus.

The processing capacity was similar in both conditions with C_s = 49.4 Hz and C_n = 48.1 Hz (see Figure 7). The distribution of its difference is centered on 0. Hence a difference is very unlikely. Importantly, this allows one to compare the attentional weights across conditions because it can be assumed that the same process distributes the same resources differently in the two conditions.

The posterior predictive (Figure 4) serves two purposes: It is a plausibility check of the model and compresses the evidence for the parameters in a prediction for new data. Because the parameters are given as distributions, the certainty of the predicted data can be indicated by the color gradient in the figure. For the current experiment, the conditions are strongly overlapping—that is, salience does not affect processing speed or attentional weights, and consequently the judgments are similar in both conditions.

Discussion

Staying close in design to the well-established TOJ paradigm while using multi-stimulus displays yielded plausible data that resembled psychometric functions. The TVA-based model was successfully applied to model the data. It was possible to estimate parameter distributions for individual participants as well as on the group level. These rates are comparable to what has been found in earlier TVA studies (e.g., Finke et al., 2005). In sum, this allows us to use TOJs on multi-element displays in order to compute TVA-based attentional parameters.

Although one stimulus was clearly salient due to its 90° orientation difference, this salience did not increase its attentional weight nor its processing rate in comparison to its counterpart from the neutral condition. Salience thus had no influence on the distribution of attention as measured by TVA parameters. This result cannot be attributed to a lack of sensitivity: The fact that the neutral weight (.5) was located outside of the HDI for the neutral condition (likely due to the fixed time of the T_probe onset) indicates the sensitivity of the approach. That is, if present, even small differences between attentional parameters of T_reference and T_probe should have been detected.

The absence of a salience effect on attentional parameters might be explained by the lack of a delay between the property which is supposed to guide attention (the local contrast) and the events which are relevant for the TOJ—that is, the onsets. TVA assumes that the sensory evidence for onset and local contrast is available equally fast. In the V1-salience model by Li (2002), however, it is assumed that salience is computed by pyramidal cells and interneurons that interact locally and reciprocally in their layer. The onset, however, can be processed by a simple feed-forward network (VanRullen & Koch, 2003). If the sensory evidence for salience is indeed not available fast enough, this would explain why the attentional weights are unaffected by salience. This explanation also fits the results of Dombrowe et al. (2010) on the time course of salience.

The following experiments changed the temporal feature of the targets. The events to be judged are target offsets in Experiment 2 and brief flickers in Experiment 3.

Experiment 2

In Experiment 2, the onsets used in Experiment 1 were replaced with offsets. Offsets are susceptible to attentional effects (Vingilis-Jaremko, Ferber, & Pratt, 2008). We hypothesized that the presence of the salience-generating property prior to the event (offset) should cue the event and hence lead to a higher attentional weight. Again, this should lead to a quicker encoding into VSTM. The offset at the potentially salient position occurred 150 ms after the onset of the display. As shown by Donk and Soesman (2010), effects of orientation salience should be present in this time range.