Skip to main content
Journal of Neurophysiology logoLink to Journal of Neurophysiology
. 2021 Apr 28;125(6):2125–2134. doi: 10.1152/jn.00275.2020

Similar masking effects of natural backgrounds on detection performances in humans, macaques, and macaque-V1 population responses

Yoon Bai 1,2, Spencer Chen 1, Yuzhi Chen 1, Wilson S Geisler 1,2,, Eyal Seidemann 1,2,3,
PMCID: PMC8285657  PMID: 33909494

graphic file with name jn-00275-2020r01.jpg

Keywords: detection, macaque, natural images, primary visual cortex, VSDI

Abstract

Visual systems evolve to process the stimuli that arise in the organism’s natural environment, and hence, to fully understand the neural computations in the visual system, it is important to measure behavioral and neural responses to natural visual stimuli. Here, we measured psychometric and neurometric functions in the macaque monkey for detection of a windowed sine‐wave target in uniform backgrounds and in natural backgrounds of various contrasts. The neurometric functions were obtained by near‐optimal decoding of voltage‐sensitive-dye-imaging (VSDI) responses at the retinotopic scale in primary visual cortex (V1). The results were compared with previous human psychophysical measurements made under the same conditions. We found that human and macaque behavioral thresholds followed the generalized Weber’s law as function of contrast, and that both the slopes and the intercepts of the threshold as a function of background contrast match each other up to a single scale factor. We also found that the neurometric thresholds followed the generalized Weber’s law with slopes and intercepts matching the behavioral slopes and intercepts up to a single scale factor. We conclude that human and macaque ability to detect targets in natural backgrounds are affected in the same way by background contrast, that these effects are consistent with population decoding at the retinotopic scale by down‐stream circuits, and that the macaque monkey is an appropriate animal model for gaining an understanding of the neural mechanisms in humans for detecting targets in natural backgrounds. Finally, we discuss limitations of the current study and potential next steps.

NEW & NOTEWORTHY We measured macaque detection performance in natural images and compared their performance to the detection sensitivity of neurophysiological responses recorded in the primary visual cortex (V1), and to the performance of human subjects. We found that 1) human and macaque behavioral performances are in quantitative agreement and 2) are consistent with near‐optimal decoding of V1 population responses.

INTRODUCTION

The macaque monkey is an important animal model of the human visual system. The anatomy and physiology of their eyes, retinae, and early visual areas are similar to those of humans. Also, behavioral studies demonstrate similarity of human and macaque performance in visual detection and discrimination tasks, including similarity in their foveal contrast sensitivity functions (1, 2), spatial resolution across the visual field (3), foveal wavelength discrimination functions (4), binocular summation (5), and binocular depth discrimination (6).

Recently there have been a number of studies measuring human ability to detect targets in natural backgrounds (7, 8), which is critical for testing whether hypotheses derived from experiments with simple stimuli generalize to natural environments. These studies have shown that human thresholds follow a generalized Weber’s law:

ct2=kbcb2+c02 (1)

where ct is the root-mean-squared (RMS) contrast of the target at threshold, cb is the RMS contrast of the background, kb is the slope parameter, and c02 the intercept parameter (c0 is the threshold on the uniform background).

There have been no studies quantitatively measuring macaque ability to detect targets in natural backgrounds (or other naturalistic backgrounds). Here, we take an initial step in this direction by comparing macaque and human psychometric functions for detecting a spatially localized sinewave target in uniform backgrounds and in natural backgrounds of several different contrasts. In a separate experiment, we measured neural population responses in fixating monkeys to these stimuli in primary visual cortex (V1) using voltage-sensitive dye imaging (VSDI), which provides a real‐time measure of the local membrane potential responses over the entire region activated by the target stimulus. From these population responses, we determined neurometric functions under the assumption of near optimal decoding at a coarse (retinotopic) spatial scale (9, 10).

We hypothesized that the neural processing underlying contrast masking effects in natural backgrounds is the same in humans and macaques, and that the generalized Weber’s law is the result of neural processing in the early visual pathway (retina and V1). If so, we expect Eq. 1 to hold, up to a single overall scale factor, for human behavior, macaque behavior, and macaque V1 population decoding. In other words, we predict that a single scale factor should align both the slopes and the intercepts of human and macaque threshold functions, and another single scale factor should align the slopes and intercepts of macaque and V1-neurometric threshold functions. This is a strong prediction because the slopes and intercepts need not vary together. However, we expect the overall scale factor (efficiency) to vary because of differences between humans and macaques in parafoveal acuity and in post-V1 processing (e.g., decision noise), and because of measurement noise in the V1 population responses as measured by VSDI.

We find that macaque and human behavioral thresholds measured under the same conditions are well‐aligned by applying a single overall scale factor, and that macaque behavioral and V1-neurometric thresholds are well‐aligned by applying a single overall scale factor. Thus, the results imply equivalent neural computations in humans and macaques, and that human and macaque thresholds can be explained by fixed‐efficiency decoding of the retinotopic population responses in V1.

MATERIALS AND METHODS

The results reported here are based on methods that have been described in detail previously (9, 11, 12). Here we focus on details that are of specific relevance to the current study. All procedures have been approved by the University of Texas at Austin Institutional Animal Care and Use Committee and conform to National Institutes of Health standards.

Detection Task

We trained three macaques to detect a small horizontal Gabor target (σ = 0.14°, 4 cpd) in natural image backgrounds. The natural image backgrounds were 4° in diameter, were windowed with a raised-cosine function (Fig. 1A), and were centered on the location (2.5° eccentricity) where the target would appear when present (Fig. 1B). We used 50 of the same natural‐image backgrounds used in a study measuring human detection thresholds (7). The backgrounds were 512 pixels in diameter and were randomly sampled from 10 large (4284 × 2844 pixels; 40 × 27 degrees) calibrated images of natural scenes that contained no human‐made objects (see Ref. 7 for details). The backgrounds were adjusted to have a Gaussian gray‐scale histogram, and their contrasts were adjusted to 0% (uniform background), 1.875%, 3.75%, and 7.5% root‐mean‐squared (RMS) contrast (in one of the monkeys we only tested background contrasts of 0%, 3.75%, and 7.5%). Background mean luminance was set to the average luminance of the uniform gray background (30 cd/m2). For each background contrast, we structured trials into blocks such that for each block the contrast of the Gabor target was fixed and the target was present on 50% of the trials. For each block, we randomly chose the target contrast from a list of fixed values that ranged from 0.74% to 7.1% RMS contrast (3.125% to 30% Michelson contrast).

Figure 1.

Figure 1.

Target detection task. A: each natural‐background patch was cropped and windowed from one of the natural scene images used in Bradley et al. (7). In the behavioral experiments 50 backgrounds were used. Shown here are the 10 backgrounds used in the VSDI experiments, which were picked to be diverse and representative. Images are displayed at 15% root-mean-square (RMS) contrast in this figure. B: background patches were 4° in diameter. The target was a horizontal Gabor (σ = 0.14°, 4 cpd). In this example, a 6% RMS contrast target is added to a natural image background with an RMS contrast of 15%. The target stimulus is shown separately in the top right corner. C: detection and fixation tasks. Detection task: detection performance was measured in a single‐interval forced‐choice task. Each trial began with a brief audible cue and a fixation point that was centered on a uniform gray background. The monkey was required to initially maintain fixation for a duration that randomly varied from 1,300 to 1,600 ms. Following this period, the fixation point dimmed and the stimulus was presented at 2.5° eccentricity. In target‐present trials, the monkey was required to saccade to the target location and maintain gaze at the target location for an additional 200 ms to receive a reward. The monkey was allowed to saccade to the target anytime from 75 ms to 600 ms after stimulus onset. Stimulus duration was up to 200 ms. The stimulus was turned off as soon as the monkey’s gaze left the fixation window. At the time of stimulus offset, a circle cue was presented at the target location to help maintain fixation following the saccade. In target-absent trials, the monkey had to ignore the circle and maintain fixation at the location of the fixation point for 600 ms after stimulus onset to receive a juice reward. Fixation task: in the fixation task, the monkey was required to maintain fixation during the entire trial. After the initial fixation phase, the stimulus was presented while the monkey continued to hold fixation. For monkeys 1 and 2, the stimulus presentation duration was 100 ms and 200 ms, respectively. VSDI, voltage‐sensitive dye imaging.

Each trial began with a brief audible cue and a fixation point that was centered on a uniform gray background displayed on a CRT (cathode-ray tube) monitor positioned at a distance of 108 cm. The monkey was required to maintain fixation for 1,300–1,600 ms to ensure that it was fully engaged in the task. Following this period, the fixation point dimmed for 300 ms and then the stimulus was presented at a parafoveal location (2.5° eccentricity). The stimulus consisted of a natural background either with or without the Gabor target. In target‐absent trials, the monkey was required to maintain fixation for an additional 1 s to receive a juice reward. Trials were categorized as “false alarms” when the monkey shifted gaze to the center of the target location. In target‐present trials, the monkey was required to saccade to the target location and maintain gaze at the target location for an additional 200 ms to receive reward. The report time window was 75 ms to 600 ms after stimulus onset. Target-present trials were categorized as “misses” when the monkey held fixation beyond the report time window. In relatively few cases (< 5%), we aborted the trial when the monkey did not hold gaze precisely (1° diameter fixation window) or when the monkey made a saccade to an arbitrary location. These aborted trials were repeated at the end of the block. Across target-present trials, the monkeys’ median reaction times were ∼200 ms. The visual stimulus was presented for up to 200 ms but was switched to uniform gray as soon as the monkey initiated a saccade. Target‐present and target‐absent trials were randomly interleaved and the duration of each trial lasted from 2 to 3 s, with a fixed intertrial interval of 2.5 s for correct responses. Audible feedback was given at the end of each trial. For correct trials (hits and correct‐rejections), a juice reward accompanied a positive audible feedback. For misses and false alarms, an extra 3-s intertrial interval was added with a negative audible feedback (for a fixed experiment duration, incorrect trials reduce the total amount of juice reward).

The number of behavioral sessions for the three monkeys were 16, 19, and 16. In each session, two of the four background contrasts were tested—five blocks of trials for one background contrast followed by five blocks of trials for the second background contrast. Each block of trials was for a different randomly selected (without replacement) target contrast. The number of trials in a block was 100 (each background was presented once with target present and once with target absent).

Estimating Behavioral Detection Performance

Psychometric functions were estimated separately for each monkey and background contrast. To take into account the effect of criterion bias, we calculated detectability (d′) values for each stimulus condition using the standard formula,

d=Φ1(ph)Φ1(pfa) (2)

where ph and pfa are the proportion of hits and false alarms. For each monkey and each stimulus condition, the d′ values were averaged across experiment sessions and converted to an effective (maximum) percent correct:

PCmax=Φ(d2) (3)

These accuracy values were then fit with a descriptive function. This descriptive function was the UNI (uncertain normal integral) function, which is similar to the familiar Weibull function but more principled (see Ref. 13),

f(c)=(12λ)Φ[12ln(exp(αc)+β1+β)]+λ (4)

where c is the RMS contrast of the target, α is a parameter that is dependent on the background contrast, β is a parameter that varies with the level of intrinsic position uncertainty of the target, and λ is the lapse rate. In this study, β was fit as a constant across all conditions, because intrinsic position uncertainty varies with retinal location (14) and the target (when present) was always positioned at the same retinal location. Lapse rate was also fixed for a given subject (or when fitting the average across subjects). Thresholds were defined to be the target contrast giving a percent correct (PCmax) of 69% (d′ = 1).

VSD Imaging

Wide‐field imaging with voltage‐sensitive dyes was used to record neural population activity at a high resolution in space and time (15). Before each imaging experiment, voltage‐sensitive dyes (RH 1691 or RH 1838) were topically applied to the cortex through a surgically implanted chamber. Measurements were made after an ∼2-h waiting period, which allowed the VSD molecules to bind with neural membranes. Fluorescence from neural activity was recorded using a commercial imaging system (Optical Imaging, Inc.). The imaging system was configured to record from a cortical region of ∼8 × 8 mm, capturing V1 population responses over the whole region where activity is elicited by the Gabor target. Imaging data were collected at 100 Hz or 110 Hz where each frame was 512 × 512 pixels. VSD molecules were excited by light at 630 nm. Fluorescence signals were measured through a dichroic mirror (650-nm long‐pass filter) and an emission filter (RG 665). Stimulus presentation and data acquisition was synchronized with the monkey’s EKG (electrocardiogram) signal to minimize trial‐to‐trial variations due to cortical pulsations from the heartbeat. VSD responses are a linear function of the locally integrated subthreshold neural activity from dendrites and axons in the superficial layers of the cortex (16, 17). More details about optical imaging with VSD in behaving monkeys are described elsewhere (11, 18, 19).

VSDI Fixation Task

VSDI measurements were made in two monkeys (different from those in the detection task) trained to perform a fixation task (Fig. 1C, bottom). Each imaging trial began with an audible cue and a small fixation point presented on a uniform blank screen. The monkey was required to first hold fixation for a duration that randomly varied between 1,300 to 1,600 ms. The stimulus was presented at the end of this initial fixation phase. The monkey was required to hold fixation during the initial and stimulus phase. Stimulus presentation was somewhat different for the two monkeys. For monkey 1 the stimulus duration was 100 ms, and for monkey 2 it was 200 ms. Although the durations were different, the neural responses were integrated over the same 100-ms time window, which was delayed from the onset of the stimulus by 50 ms to account for the latency of the cortical response (see Fig. 2). Also, recall that the median reaction time of the monkeys in the detection experiment was 200 ms (50 ms after the end of the 100-ms integration window). The monkey did not receive a juice reward if fixation was broken during either the initial or stimulus phase of the trial.

Figure 2.

Figure 2.

Example VSDI imaging session from monkey 1. Each row (A–D) shows the average VSDI response for a particular background (BG) contrast (0%, 1.875%, 3.75%, 7.5% RMS), when the target is present and absent. The first two columns show the average responses when the target contrast is 9.5% RMS and 0% RMS (10 trials each), and the third column shows the difference in these average responses. The response images were derived by integrating for 100 ms beginning at response onset (which occurred at a fixed latency after stimulus onset). The last column shows the mean time courses in the central region outlined by the red box (1 × 1 mm2). Time courses were plotted after subtracting the mean time course of background‐only trials. The gray rectangular shading indicates the stimulus presentation and the horizontal black bar indicates the integration interval (100 ms). Mean time courses are color‐coded to represent the range of target contrasts used in the experiment (0%, 0.7%, 1.5%, 3%, 6%, and 9.5% RMS). Circular apertures are used for display purposes only. RMS, root mean square; ROI, region of interest; VSDI, voltage‐sensitive dye imaging.

We used the same levels of masking contrast used in the detection task. The RMS contrasts of natural background images were 0%, 1.875%, 3.75%, and 7.5% for monkey 1, and 0%, 3.75%, and 7.5% for monkey 2. For each background contrast, we presented six different levels of target contrast (Gabor target RMS contrast: 0%, 0.74%, 1.5%, 3%, 6%, and 9.5%; Michelson contrast: 0%, 3.125%, 6.25%, 12.5%, 25%, and 40%). Each target contrast was presented 10 times in random order. As a result, 60 trials were presented at a particular background contrast. In each VSD imaging session, we measured responses for a 0% (uniform) background and for one higher contrast background. For monkey 1, the number of VSD imaging sessions at each background contrast was 27, 6, 13, and 8. For monkey 2, the number of VSD imaging sessions at each background contrast was 19, 9, and 10. The background luminance was fixed at 30 cd/mm2 for all experiments.

Although the stimulus conditions were essentially the same as in the detection experiment, only a subset of 10 natural background patches were tested. For each background contrast, each of the 10 background patches was presented once without the target and once with the target at each contrast level. Fewer backgrounds were tested because less time was available for making the VSDI measurements than the behavioral measurements. The natural background patches were manually selected to include a range of representative spatial structures from the full set (e.g., dense, sparse, oblique, horizontal, and vertical structure; see Fig. 1A).

Estimating VSDI Detection Performance

VSDI data were collected in 15 imaging days from monkey 1 and 11 imaging days from monkey 2. In each imaging day, we could collect several imaging sessions. The VSDI image frames were initially binned into 64 × 64 pixels image frames, where each pixel corresponded to 0.11 × 0.11 mm of cortex. Spatial binning filters out some of high spatial frequency shot noise in the camera images (9, 16). Next, aberrant VSDI trials (<5% of all trials) were identified and removed (see Ref. 9). For each trial, we normalized (divided) the fluorescence amplitude at each pixel location by the average at that location measured during the first 100 ms of imaging (which was before stimulus onset).

Following these preprocessing steps, neurometric functions were measured by applying a whitened template‐matching (WTM) decoder to the population responses on each trial. The decoder was constructed in a way proposed by Chen et al. (9). The WTM decoder is the optimal observer for detecting fixed targets in correlated additive Gaussian noise (e.g., see Ref. 20), which is a good description of the noise in VSDI responses at the retinotopic scale (see Ref. 9). In other words, our aim was to determine (approximately) the maximum discrimination performance possible given the information in the VSDI signals measured at the retinotopic scale. If down‐stream circuits are using this information with a fixed efficiency, then we would expect neurometric performance to parallel behavioral performance.

To specify the WTM decoder for a given imaging session, we first computed the average response in the 100‐ms integration period (black horizontal bars in Fig. 2A, rightmost), for each pixel location, when the background was uniform and the target was at its highest (9.5%) and lowest (0%) contrast (see Fig. 2A). We then subtracted the 0% contrast image from the highest target contrast (9.5%) image. In general, we found that these incremental response profiles are ∼2 D Gaussians (Fig. 3, A and B). Thus, we fit the difference‐response images from each session with a two‐dimensional Gaussian function to obtain the “unwhitened” template. Figure 3B shows an example of this template.

Figure 3.

Figure 3.

The whitened template-matching (WTM) decoder. WTM decoders are approximately optimal for detection in correlated Gaussian noise, which is a good description of the noise in VSDI measurements. The WTM decoder applies a whitened template (weighting function) to the spatial response pattern on each trial and responds that the target is present if the template response exceeds a criterion. A: the average difference in VSDI response to a 9.5% contrast target and a 0% contrast target (target absent) on a uniform background. For illustrative purposes only, we used imaging sessions from both monkeys to enhance response profiles for the same target stimulus. We applied an image registration procedure to align the center location and orientation of target-response profiles across sessions. We excluded few sessions in which the image registration failed to align response profiles. As a result, we used a total of 22 imaging sessions from both monkeys (14 sessions from monkey 1 and 8 sessions from monkey 2). B: the 2 D Gaussian that was fit to the difference responses in A. For each imaging session, a 2 D Gaussian was fit to the average difference response in the uniform background conditions. This fitted Gaussian was used in determining the whitened template for that session. C: example whitened template computed from B. The whitened template is a filtered version of the fitted 2 D Gaussian that takes into account of the spatial noise correlations. Specifically, the antagonistic center‐surround weights cancel much of the correlated noise, while removing little of the signal (see Ref. 9 and text for more details). Circular apertures are used for display purposes only. a.u., arbitrary units; VSDI, voltage‐sensitive dye imaging.

To determine the optimal “whitened” template, we first measured the radially averaged power spectrum of the VSD responses for the target-absent trials with a uniform background. We then divided the amplitude spectrum of the unwhitened template by the power spectrum (square of the amplitude spectrum) of the uniform‐background signals to obtain the amplitude spectrum of the whitened template (20). Finally, we inverse Fourier transformed this amplitude spectrum to obtain the whitened template (e.g., Fig. 3C; see Ref. 9 for more details).

On each trial, the WTM decoder applies the whitened template to the population response (i.e., takes the dot product of the template and the population response) to obtain a single response scalar. If this response scalar exceeds a criterion the observer reports “target present,” otherwise “target absent”. Because we measure the real-valued response of the decoder, it is most efficient to estimate the detectability d′ of the decoder by taking the difference in the mean responses divided by the square root of the average variance of the responses, and then convert to percent correct (see Eq. 3).

Notice that the whitened template is most positive in the center where the response to the target is largest and eventually becomes negative away from the center where the response to the target becomes negligible. To get some intuition for why this whitened template is the near‐optimal decoder, we note that the noise correlations are substantial even at large distances from the center where the response to the target is weak or absent. This means that subtracting (negatively weighting) the responses at large distances cancels much of the correlated noise in the regions where the response to the target is strong, without cancelling much of the response to the target. This explains why the whitened template performs better than the simple unwhitened template (9).

In each imaging session, two neurometric functions were measured, one for the uniform background and one for natural backgrounds of some fixed contrast. We found that the neurometric functions for the uniform background varied substantially from session to session, almost surely because of variation of the quality of the dye staining and other nonneural scaling factors. Therefore, we combined the data across sessions using a simple model that accounts for the session‐to‐session variation in scaling factors.

Specifically, we assumed that the WTM response R on each trial is the sum of the neural response Rn that is scaled by a dye efficiency constant for that session al, plus a nonneural response R0 due to measurement noise. If we let l index the session, k the background contrast, and j the target amplitude, then the responses to the specific stimuli in each session is given by

Rjkl=alRjk+R0 (5)

If we define response to background alone as Rkl, then it follows that the measured detectability of the target in a given condition and session is as follows:

djkl=αlΔujkσkl (6)

where Δujkl = E(Rjkl) − E(Rkl) and σkl2= VAR(Rkl) are the measured difference in the means and measured variance, which we find is very nearly same whether the target is present or absent. Furthermore, although we found that the values of mean responses varied substantially across sessions and background contrast, we found that the variances of the responses were relatively stable and uncorrelated with the variation in the means [Pearson correlation coefficients for monkey 1, r =0.05 (P = 0.69), and monkey 2, r =0.01 (P = 0.95)]. Thus, it was appropriate to assume that the standard deviations are approximately constant σkl ≈ σ. This standard deviation may represent a mixture of measurement and neural noise. However, because the response variances did not systematically vary across sessions and conditions, it was not possible, in this study, to separately estimate the contributions of measurement and neural noise. Nonetheless, we were able to use maximum likelihood methods to estimate simultaneously (from all data in all sessions for each monkey separately) the delta means, efficiency scalars, and the single standard deviation.

We note that when the value of al is relatively low, the data from that session contributes relatively less (as it should) to the estimates of the delta means. We found that only one session, in monkey 2, had a really low value of al (0.027) in comparison to the range of values in all other sessions (minimum: 0.36, maximum: 1.5, median: 0.9). All other sessions contributed substantially to the estimates of detectability.

Finally, the estimated neurometric functions were obtained by setting the efficiency scalar to its value on the best day, and then fitting the estimated detectabilities in the same way we fitted the behavior psychometric functions (i.e., with Eqs. 3 and 4).

Fitting the Generalized Weber’s Law

We fitted the human, macaque, and V1 thresholds to determine whether they are simultaneously consistent with a generalized Weber’s law having exactly the same slope and intercept parameters, kb and c0, but potentially different overall scale factors ks (see the equation in Fig. 4C). We fitted all the thresholds in the three sets simultaneously by minimizing the squared error (9 human behavioral thresholds, 11 macaque behavioral thresholds, and 7 macaque neurometric thresholds). There was a total of four free parameters. Two of the parameters were the common slope and intercept parameters. The other two parameters were the values of the overall scale factor for the macaque behavioral and V1 thresholds (without loss of generality we could set the human overall scale factor to 1.0). To summarize the goodness of fit we report, in Fig. 4D, the fraction of variance explained, based on all the thresholds in the set (e.g., 9 thresholds for the goodness of fit to the human data).

Figure 4.

Figure 4.

Behavioral and neural detection performance. A: bias‐corrected psychometric functions of three monkeys for detection in uniform backgrounds (BG) and in natural backgrounds of several different RMS contrasts. The solid symbols and thick curves are the average psychometric functions. The thin curves are the psychometric functions of the individual monkeys. The error bars are standard errors across the three subjects. B: neurometric functions measured in two fixating monkeys. The solid symbols and thick curves are the average neurometric functions. The thin curves are the neurometric functions of the individual monkeys. The neurometric functions were obtained by applying a whitened template‐matching (WTM) decoder to voltage‐sensitive‐dye‐imaging (VSDI) responses recorded at a retinotopic scale in primary visual cortex (V1). C: threshold as a function of background contrast. The gray symbols are the average behavioral thresholds of three monkeys. The black symbols are the average thresholds of three humans for the same targets and backgrounds (although with a different range of background contrasts; data from Ref. 7). The red symbols are the average neurometric thresholds of the WTM decoder applied to the VSD responses of two monkeys. The solid curves represent the generalized Weber’s law with the same slope and intercept parameters (kb, c0), but different overall scale factors (ks). D: agreement between thresholds of humans, monkeys, and V1 population responses. The black curve is the estimated generalized Weber’s law when ks = 1 (see the equation in C). The symbols are the thresholds of the humans, monkeys, and V1 population responses after correcting for the effect of the scale factors. The R2 values indicate the fraction of variance explained by the generalized Weber’s law (Eq. 1) for all the thresholds in each subject group. cb, root-mean-squared contrast of the background; ct, root-mean-squared contrast of the target at threshold; c0, threshold on the uniform background; c02, intercept parameter; kb, slope parameter; RMS, root mean square.

Statistical Analyses

Details of experimental procedures and visual stimuli are described above (see materials and methods: Detection Task and VSDI Fixation Task). Behavioral performance was quantified by fitting psychometric curves using maximum‐likelihood estimation. Summary statistics were derived separately for each individual subject, and standard errors were used to report across‐subject variability. For VSDI responses, confidence intervals for the WTM decoder’s performance in each condition was estimated by bootstrap sampling (500 iterations).

Code/Software

Code is available upon reasonable request.

RESULTS

Behavioral Detection Performance

Psychometric functions were measured in three monkeys for detection of a 4-cpd windowed sine‐wave target presented at 2.5° eccentricity in randomly selected natural backgrounds scaled to one of four different contrasts. Figure 4A shows the average bias‐corrected psychometric functions for the three monkeys. The faint overlaid curves show the psychometric functions of the individual monkeys. As expected, the curves shift to the right as the background contrast increases. The gray circles in Fig. 4C show the average thresholds corresponding to 69% correct (d′ = 1; note that the thresholds were obtained separately for each monkey and then averaged.) For comparison, the black circles show the average thresholds for three human observers for the same natural backgrounds at the same retinal eccentricity (but with a different range of background contrasts). The gray and black curves correspond to the generalized Weber’s law with the same slope and intercept parameters (kb, c0, see Eq. 1) up to a single human-to-monkey scale factor ks = 2.4. The gray and black symbols in Fig. 4D show the agreement after scaling. The R2 values indicate the fraction of variance explained based on the average thresholds for the individual subjects (humans, 9 thresholds; monkeys, 10 thresholds): R2 = 1 − (sum square of residuals)/(total sum square).

Clearly, monkey and human thresholds in natural backgrounds are consistent with a generalized Weber’s law, where the ratio of slope to intercept is the same for two species. This is evidence for a common neural computation and suggests that the macaque monkey is an appropriate animal model for human detection in natural backgrounds.

Neural Detection Performance

The consistency of macaque and human detection performance in natural backgrounds motivated us to make some initial measurements of the neural population responses in primary visual cortex of fixating macaque monkeys. These measurements were made in two additional monkeys that did not participate in the behavioral experiments. To assess the masking effects of natural backgrounds on detection sensitivity of neural population responses at the retinotopic scale in macaque V1, we applied a whitened template‐matching (WTM) decoder (Fig. 3) to the single‐trial responses. The appropriateness of this decoder is supported by the fact that the noise spectrum was similar (not shown here) across the contrast levels of the target and across the contrast levels of the natural backgrounds.

Figure 4B shows the average neurometric functions from the VSDI responses of the two monkeys for uniform and natural backgrounds of the same contrasts used in the behavioral experiments. The faint curves are the neurometric functions of the individual monkeys. In agreement with behavior, the neurometric functions of the WTM decoder shift to the right as the background contrast increases.

As in the behavioral experiments, we defined the neural threshold as the target contrast corresponding to d′ = 1. The red symbols in Fig. 4C are the average neural thresholds for the two monkeys, and the solid curve shows the generalized Weber’s law for the exactly the same slope and intercept parameters as for the human and macaque behavioral data, but with a different overall scale factor (ks = 3.2). The red symbols in Fig. 4D show the agreement after scaling (R2 value based on 8 thresholds).

The behavioral psychometric functions of the macaque monkeys (see Fig. 4A) showed evidence of a small lapse rate of ∼5%. We also computed neurometric functions assuming a 5% lapse rate in decision-making. The scale factor for the neurometric thresholds increased slightly, but the agreement after scaling was as good as, or better than, that shown in Fig. 4D. We also varied the value of d′ used to define threshold and found that the agreement after scaling was as good or better than that shown in Fig. 4D. These results strengthen the evidence for a common neural computation and suggest that the macaque monkey (and human) behavioral thresholds are closely linked to the combined neural computations in retina and V1.

DISCUSSION

Natural selection causes perceptual and cognitive mechanisms to be relatively well matched to an organism’s natural tasks and stimuli. Thus, analyzing natural tasks and stimuli can be useful for obtaining principled hypotheses for neural computation. Furthermore, to fully understand neural computations, it is crucial to measure and analyze behavioral and neural responses to the natural stimuli that the nervous system evolved to process. Here, we measured psychometric and neurometric functions in the macaque monkey for detection of a simple target in uniform backgrounds and in natural backgrounds of various contrasts. We chose the macaque because anatomical, physiological, and behavioral studies (with simple stimuli) have shown the macaque visual system to be an excellent animal model of the human visual system.

We found that the psychometric functions and behavioral thresholds of three macaque monkeys measured at 2.5-degree eccentricity closely matched (up to a single overall scale factor) those of three human observers measured in an earlier study (7) for the same stimuli at the same retinal eccentricity. Both humans and macaques followed the generalized Weber’s law as function of contrast for detection in natural backgrounds. Importantly, we found that the slope and intercept parameters of the generalized Weber’s law were the same in humans and macaques (up to a single scale factor), suggesting a common neural mechanism. Macaque thresholds were higher than human thresholds. This is likely to reflect a mixture of differences between humans and macaques in post-V1 processing (e.g., decision noise) as well as differences in acuity in the parafovea (the target’s spatial frequency of 4 cpd is relatively high for macaque parafovea, due at least in part to the smaller eye size of macaques, Ref. 21).

In a second study, we measured neurometric functions (for a subset of the same stimuli) from VSDI responses recorded at the retinotopic scale in the region of macaque V1 corresponding to ∼2.5-degree eccentricity. The neurometric functions were computed using a whitened template-matching (WTM) observer, which is near‐optimal for VSDI responses at the retinotopic scale. We found that the neurometric thresholds measured in two macaques also followed the generalized Weber’s law with the same slope and intercept parameters (up to a single scale factor) as the behavioral thresholds of the three macaques and three humans. Neurometric thresholds were in general higher than psychometric thresholds. This is not surprising given that V1 population responses, as measured by VSDI, are affected by various sources of measurement noise that can lead to a fixed inefficiency in our neurometric functions. This is the reason that we focused on comparing the relative change in neurometric and psychometric thresholds as a function of background contrast rather than directly comparing these thresholds. To analyze the data, we also introduced a simple new method for combining VSDI measurements across imaging sessions that considers variation in the quality of dye staining and other nonneural scale factors.

The main conclusions are that 1) human and macaque ability to detect targets in natural backgrounds are very similarly affected by background contrast, 2) these effects (generalized Weber’s law) suggest a common neural mechanism, 3) population decoding at the retinotopic scale by down‐stream circuits is consistent with the behaviorally measured generalized Weber’s law, and 4) the macaque monkey is an appropriate animal model for gaining an understanding of the neural mechanisms in humans for detecting targets in natural backgrounds.

In humans, the generalized Weber’s law is known to hold for detection of targets in white noise backgrounds (22, 23) and in 1/f noise backgrounds (7, 24). However, Weber’s law has generally not been found to hold in simple backgrounds such as sinewave gratings (e.g., see Ref. 25). Thresholds in sinewave-grating backgrounds sometimes show a decrease in threshold at low background contrasts (the “dipper” effect), and they typically follow a power law with an exponent less than 1.0 at higher background contrasts. Thus, the current results in monkeys, and previous results in humans (7, 8), show that detection in natural backgrounds is more like detection in noise backgrounds than detection in simple backgrounds. In other words, Weber’s law (with no dipper effect) appears to be a better description of contrast masking under real-world conditions. Sebastian et al. (8) also show that Weber’s law is predicted directly from the statistical properties of natural backgrounds and is consistent with the contrast normalization mechanisms found in the retina and visual cortex.

In the current study, which was based on Bradley et al. (7), the natural background on each trial was adjusted to have a Gaussian gray‐scale histogram so that every randomly selected background, even with an added high‐contrast target, could be presented on a standard display without clipping. Also, the natural backgrounds were adjusted to have specific RMS contrasts. A different approach, based on constrained sampling from natural images, was used by Sebastian et al. (8). In the constrained‐sampling approach, millions of natural background patches the size of the target are sorted into a three‐dimensional histogram along the three background dimensions known to have a big effect on detection performance: luminance, contrast, and structural (spatial frequency and orientation) similarity to the target. One advantage of the constrained‐sampling approach is that no adjustments of the natural backgrounds are necessary because measurements can be limited to the space of bins where significant clipping does not occur. A second major advantage is that by sampling backgrounds from a sparse set of bins it is possible to measure, in modest‐scale experiments, how the three dimensions affect detection performance individually and in combination. We also note that a limitation of the current study was that the VSDI measurements were made in a different set of fixating monkeys, and hence it was not possible to analyze the trial‐by‐trial correlations between the neural and behavioral responses. A logical next step, which we are currently pursuing, is to simultaneously measure VSDI responses and behavioral responses in constrained sampling experiments at both the retinotopic scale, and at the finer orientation‐column scale where the neural effects of background similarity may become more apparent.

Other limitations of the current study are that the target was a simple Gabor patch, and the target and natural backgrounds were gray-scale rather than natural-color stimuli. The generalized Weber’s law for masking in noise backgrounds is known to hold for a wide range of targets (see references above). Furthermore, we know that threshold is proportional to the similarity of the background to the target and that this effect is independent of the effect of background contrast (8). Thus, the more similar a target is to a natural background the higher would be the threshold, but Weber’s law for contrast should still hold. Thus, it is likely that our conclusions would be the same for any fixed target.

We chose to focus here on gray-scale images for simplicity and because the literature on detection in natural and artificial backgrounds mostly concerns gray-scale images. Another logical next step would be to parametrically measure how thresholds are affected by the chromatic similarity between targets and natural backgrounds.

GRANTS

This work was supported by US National Institutes of Health Research Grants EY024662, EY016454, and EY11747.

DISCLOSURES

No conflicts of interest, financial or otherwise, are declared by the authors.

AUTHOR CONTRIBUTIONS

Y.B., Y.C., W.S.G., and E.S. conceived and designed research; Y.B., S.C., and Y.C. performed experiments; Y.B. and S.C. analyzed data; Y.B., W.S.G., and E.S. interpreted results of experiments; Y.B. prepared figures; Y.B. and W.S.G. drafted manuscript; Y.B., W.S.G., and E.S. edited and revised manuscript; W.S.G. and E.S. approved final version of manuscript.

ACKNOWLEDGMENTS

Present address of Y. Bai: Dept. of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts.

REFERENCES

  • 1.De Valois RL, Morgan H, Snodderly DM. Psychophysical studies of monkey vision‐III: spatial luminance contrast sensitivity tests of macaque and human observers. Vision Res 14: 75–81, 1974. doi: 10.1016/0042-6989(74)90118-7. [DOI] [PubMed] [Google Scholar]
  • 2.Harwerth RS, Smith EL. Rhesus monkey as a model for normal vision of humans. Am J Optom Physiol Opt 62: 633–641, 1985. doi: 10.1097/00006324-198509000-00009. [DOI] [PubMed] [Google Scholar]
  • 3.Merigan WH, Katz LM. Spatial resolution across the macaque retina. Vision Res 30: 985–991, 1990. doi: 10.1016/0042-6989(90)90107-V. [DOI] [PubMed] [Google Scholar]
  • 4.De Valois RL, Jacobs GH. Primate color vision. Science 162: 533–540, 1968. doi: 10.1126/science.162.3853.533. [DOI] [PubMed] [Google Scholar]
  • 5.Harwerth RS, Smith EL. Binocular summation in man and monkey. Am J Optom Physiol Opt 62: 439–446, 1985. doi: 10.1097/00006324-198507000-00002. [DOI] [PubMed] [Google Scholar]
  • 6.Harwerth RS, Smith EL, Siderov J. Behavioral studies of local stereopsis and disparity vergence in monkeys. Vision Res 35: 1755–1770, 1995. doi: 10.1016/0042-6989(94)00256-L. [DOI] [PubMed] [Google Scholar]
  • 7.Bradley C, Abrams J, Geisler WS. Retina‐V1 model of detectability across the visual field. J Vis 14: 221–222, 2014. doi: 10.1167/14.12.22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Sebastian S, Abrams J, Geisler WS. Constrained‐sampling experiments reveal principles of detection in natural scenes. Proc Natl Acad Sci USA 114: E5731–E5740, 2017. doi: 10.1073/pnas.1619487114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Chen Y, Geisler WS, Seidemann E. Optimal decoding of correlated neural population responses in the primate cortex. Nat Neurosci 9: 1412–1420, 2006. doi: 10.1038/nn1792. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Chen Y, Geisler WS, Seidemann E. Optimal temporal decoding of V1 population responses in a reaction‐time detection task. J Neurophysiol 99: 1366–1379, 2008. doi: 10.1152/jn.00698.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Seidemann E, Arieli A, Grinvald A, Slovin H. Dynamics of depolarization and hyperpolarization in the frontal cortex and saccade goal. Science 295: 862–865, 2002. doi: 10.1126/science.1066641. [DOI] [PubMed] [Google Scholar]
  • 12.Palmer C, Cheng SY, Seidemann E. Linking neuronal and behavioral performance in a reaction-time visual detection task. J Neurosci 27: 8122–8137, 2007. doi: 10.1523/JNEUROSCI.1940-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Geisler WS. Psychometric functions of uncertain template matching observers. J Vis 18: 1, 2018. doi: 10.1167/18.2.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Michel MM, Geisler WS. Intrinsic position uncertainty explains detection and localization performance in peripheral vision. J Vis 11: 18, 2011. doi: 10.1167/11.1.18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Shoham D, Glaser DE, Arieli A, Kenet T, Wijnbergen C, Toledo Y, Hildesheim R, Grinvald A. Imaging cortical dynamics at high spatial and temporal resolution with novel blue voltage‐sensitive dyes. Neuron 24: 791–802, 1999. doi: 10.1016/S0896-6273(00)81027-2. [DOI] [PubMed] [Google Scholar]
  • 16.Chen Y, Palmer CR, Seidemann E. The relationship between voltage‐sensitive dye imaging signals and spiking activity of neural populations in primate V1. J Neurophysiol 107: 3281–3295, 2012. doi: 10.1152/jn.00977.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Grinvald A, Hildesheim R. VSDI: a new era in functional imaging of cortical dynamics. Nat Rev Neurosci 5: 874–885, 2004. doi: 10.1038/nrn1536. [DOI] [PubMed] [Google Scholar]
  • 18.Slovin H, Arieli A, Hildesheim R, Grinvald A. Long‐term voltage‐sensitive dye imaging reveals cortical dynamics in behaving monkeys. J Neurophysiol 88: 3421–3438, 2002. doi: 10.1152/jn.00194.2002. [DOI] [PubMed] [Google Scholar]
  • 19.Arieli A, Grinvald A, Slovin H. Dural substitute for long‐term imaging of cortical activity in behaving monkeys and its clinical implications. J Neurosci Methods 114: 119–133, 2002. doi: 10.1016/S0165-0270(01)00507-6. [DOI] [PubMed] [Google Scholar]
  • 20.Brunelli R, Poggio T. Template matching: matched spatial filters and beyond. Pattern Recognit 30: 751–768, 1997. doi: 10.1016/S0031-3203(96)00104-5. [DOI] [Google Scholar]
  • 21.Goodchild AK, Ghosh KK, Martin PR. Comparison of photoreceptor spatial density and ganglion cell morphology in the retina of human, macaque monkey, cat, and the marmoset Callithrix jacchus. J Comp Neurol 366: 55–75, 1996. doi:. [DOI] [PubMed] [Google Scholar]
  • 22.Burgess AE, Wagner RF, Jennings RJ, Barlow HB. Efficiency of human visual signal discrimination. Science 214: 93–94, 1981. doi: 10.1126/science.7280685. [DOI] [PubMed] [Google Scholar]
  • 23.Legge GE, Kersten D, Burgess AE. Contrast discrimination in noise. J Opt Soc Am A 4: 391–404, 1987. doi: 10.1364/JOSAA.4.000391. [DOI] [PubMed] [Google Scholar]
  • 24.Najemnik J, Geisler WS. Optimal eye movement strategies in visual search. Nature 434: 387–391, 2005. doi: 10.1038/nature03390. [DOI] [PubMed] [Google Scholar]
  • 25.Swift DJ, Smith RA. Spatial frequency masking and Weber’s law. Vision Res 23: 495–505, 1983. doi: 10.1016/0042-6989(83)90124-4. [DOI] [PubMed] [Google Scholar]

Articles from Journal of Neurophysiology are provided here courtesy of American Physiological Society

RESOURCES