Temporal integration characteristics of an image defined by binocular disparity cues

Fumiya Haraguchi; Rumi Hisakata; Hirohiko Kaneko

doi:10.1177/20416695231224138

. 2024 Jan 9;15(1):20416695231224138. doi: 10.1177/20416695231224138

Temporal integration characteristics of an image defined by binocular disparity cues

Fumiya Haraguchi ¹, Rumi Hisakata ^2,^✉, Hirohiko Kaneko ³

PMCID: PMC10777792 PMID: 38204517

Abstract

We can correctly recognize the content of an image by presenting all of the elements within a limited time, such as in a slit view or a divided painting image. It is important to clarify how temporally divided information is integrated and perceived to understand the temporal properties of the information-processing mechanism of visual systems. Previous studies related to this topic have often used two-dimensional pictorial stimuli; however, few have considered the temporal integration of binocular disparity for the recognition of objects defined with disparity. In this study, we examined image recognition properties based on the temporal integration of binocular disparity, by comparing that based on the temporal integration of luminance. The effect of element onset asynchrony (the time lag among presented elements) was somewhat similar between disparity and luminance with respect to randomly divided elements. On the other hand, under slit-vision conditions, the tolerance range of spatiotemporal integration for luminance stimuli was much wider than that for disparity stimuli. These results indicate that the temporal integration mechanism in localized areas is common to disparity and luminance, but that for global motion shows differences between the two mechanisms. Thus, we conclude that global motion has little contribution to the temporal integration of binocular disparity information for image recognition.

Keywords: temporal integration, binocular disparity, shape recognition, slit vision

The visual system integrates information spatially and temporally in the visual field to perceive the outside world. One of the reasons for this property is that visual acuity differs greatly depending on the position in the visual field, given that the resolution is high in the fovea centralis but lower in the peripheral field. To correctly recognize the entire visual scene, it is necessary to integrate information presented with spatial and temporal differences due to eye movement.

Some studies have shown that it is possible to integrate visual information with time lags for scene recognition. Ikeda and Uchikawa (1978) investigated the mechanism of temporal integration for object recognition by measuring the lags when observing a painting divided into elements and displayed with temporal asynchronies. When all of the elements were presented within a short period, the content of the painting was recognized. By contrast, if it took a long time to present all of the elements, the participants were unable to recognize what was drawn. This temporal integration of visual patterns is explained by the temporal properties of lower-order visual processing, such as visual information storage, which retains visual information after it has disappeared (Coltheart, 1980; Nikolić et al., 2009; Teeuwen et al., 2021). It has been shown that patterns presented in a short time window of a few hundred milliseconds can be perceived as if all were seen simultaneously (Ikeda & Uchikawa, 1978). In the experiment by Ikeda and Uchikawa (1978), there were two conditions: one in which the participants were asked to verbally respond to patterns as image recognition, and the other in which they were asked to draw the patterns they observed. In the drawing condition, the participants were able to replay the image pattern even if they did not understand the meaning of the pattern, and they were able to understand the meaning of the pattern only when they saw the pattern replayed and drawn by themselves. This tendency was also common in the condition of tactile recognition of the image pattern formed from the unevenness of the plastic stimulus. It has also been shown that even if a pattern is not recognized as meaningful at the time of stimulus observation, it can be recognized because of replaying and drawing the pattern as a picture. Such multisensory and time-consuming pattern recognition is qualitatively different from the perceptual image obtained with a short time presentation. Although it has been suggested that higher-order working memory is involved in pattern recognition resulting from the integration of tactile and visual sensation (Matusz et al., 2017; Quak et al., 2015; Shioiri et al., 1983), the perceptual image obtained after a short presentation time of several hundred milliseconds is basically a function of the lower-order temporal integration of visual processing (e.g., Burr & Morrone, 1993). It can be regarded as a kind of bottom-up processing of perception.

On the other hand, it is known that even in the temporal integration of lower-order visual processing, different time windows can be obtained depending on the task and the way the visual stimuli are presented (Fink et al., 2006; Melcher et al., 2014). For many types of stimulus presentations, two stimuli presented within 40 ms are synchronously perceived; however, in some situations, they can be integrated over longer time windows. For example, even if a stimulus is presented behind a thin slit and only a small portion of the stimulus is visible, participants recognize the whole object behind the slit. In addition, Parks (1965) found that when a stimulus moves behind a narrow slit, the perception of the stimulus differs depending on its velocity (presentation duration of the stimulus); specifically, they showed that if the stimulus moves quickly behind the slit, the size is perceived as shrunk, whereas if it moves slowly, participants cannot perceive the whole object and the size appears to be expanded. This fact indicates that we can reconstruct information behind the slit from the spatiotemporal information of the motion of the slit window. Just as tactile reproduction (drawing) of a spatially randomly presented image can reproduce it (Ikeda & Uchikawa, 1978), slit viewing involve higher-order visual integration processing in which local structures given in fragments are supplemented by linking them together from global motion information. Such temporal completion may be used not only in the special situation of slit viewing, but also in the integration of scenes across saccadic eye movements.

Temporal integration of visual information should be done within reasonable temporal limits, but there is no guarantee or reason that each visual attribute will have similar temporal integration properties. For example, there are various cues to information about the depth of an object or scene. For example, what are the temporal integration properties of binocular disparity, which are obtained by integrating information from both eyes? Some studies have investigated three-dimensional (3D) object recognition from the stimulus movement, although most slit-vision studies have focused on two-dimensional pictorial recognition of the images. Research has shown that shape perception is possible, even in the slit vision of a rotating object or a cube frame (Norman et al., 2021). This indicates that even under slit-vision conditions, a participant is able to temporally integrate the local motion information to recognize and estimate the whole object's motion and restore the 3D shape. To date, studies of 3D-shaped slit vision have dealt mostly with the temporal integration of monocular depth cues, such as shading and rotational motion. We assume here that the temporal integration of binocular disparity is an important cue for detecting object depth in 3D shape recognition. However, no studies have investigated the temporal integration of object recognition defined by binocular disparity.

In this study, we attempted to clarify the properties of temporal integration processing of binocular disparity for shape recognition by comparing it to that of luminance information. We used two-digit numbers defined by binocular disparity or luminance as targets in the stimulus to measure the response accuracy, in which the numbers consisted of random dots that were divided into groups and presented with time lags.

In Experiment 1, the stimulus elements were divided into groups, and the groups were presented with element onset asynchrony (EOA), in which we manipulated the duration between the onset of the first presented elements and that of the last presented elements, such that the total presentation duration was variable. The image was defined by binocular disparity or luminance. We examined how the elements presented with a time lag were integrated and perceived. Our results showed that the larger the EOA between the element group, the more difficult it was to integrate the stimulus using either disparity or luminance for perceiving the image. In Experiment 1, the number of stimuli presented to the participant at the same time (hereinafter referred to as the instantaneous maximum ratio [IMR] of the stimulus element) changed with the EOA conditions. To clarify whether EOA or the IMR is important for recognizing the object's shape, we fixed the IMR among the conditions of the presentation durations in the next experiment. We found that even when the IMR was constant, the performance of image recognition in the random presentation decreased as the EOA among elements increased. Furthermore, the EOA effect did not differ between stimuli defined by disparity or luminance.

Furthermore, we examined the effect of the presentation order of elements in Experiment 2. We compared a slit-vision presentation, in which the stimulus elements were presented in order from one of the edges of the image (left or right), with that of a random presentation in which elements were presented in random order. Under the slit-vision condition, the rate of correct answers for the luminance stimulus increased rapidly with the EOA, but that for the disparity stimulus increased more slowly.

In Experiment 3, we investigated the effect of eye movement due to the movement of the slit window, in which we compared the conditions of slit motion and stimulus motion on the background behind a fixed-slit window. Under the condition of slit motion, the slit moved in front of a stationary image. Under stimulus motion, on the other hand, the stimulus image moved behind the stationary slit. In both conditions, the image sequences presented on the retina would be the same. We found that the rate of correct answers did not change much for either condition for the stimulus defined by binocular disparity, suggesting that the eye movement following the slit window did not affect the results of Experiment 2.

Taken together, the results of the three experiments suggest that the characteristics of local temporal integration for both disparity and luminance are similar, whereas those of spatiotemporal integration based on motion detection mechanisms differ. Thus, motion information does not appear to contribute to the integration of binocular disparity for shape perception.

Experiment 1: Temporal Integration of Binocular Disparity and Luminance for Shape Recognition

In Experiment 1, two types of stimuli, one defined by luminance and the other defined by binocular disparity, were used to compare the properties of their temporal integration for image perception. The stimulus consisted of random dots and the task was to detect two-digit numbers defined by binocular disparity or luminance. We divided the stimulus into element groups and introduced EOA while measuring the recognition rate of the two-digit numbers.

Methods

Participants

Eight participants with normal visual acuity or corrected normal visual acuity, including the three authors, participated in the experiment. We conducted a simple stereo test (Stereo Fly test, Stereo Optical Co., Chicago, IL, USA) to measure stereoscopic visual acuity, confirming that all participants could detect disparity at a minimum of 200 arcsec of visual angle. Notably, one participant in the preliminary experiment, who had a rate of correct answers of <90% in the recognition of stimuli using disparity, did not perform the main experiment. The experiment was conducted with the approval of the Tokyo Institute of Technology Ethics Review Board.

Apparatus

In the experiment, a haploscope consisting of a control computer (MacBook Pro, OS X Yosemite, Apple, Inc., Cupertino, CA, USA), two mirrors placed at right angles, and two displays placed in parallel (Display SONY Professional Video Monitor, PVM-A170; 60 fps; 36.58 × 20.57 cm², viewing angle 53.2° × 31.5°; Sony Corp., Tokyo, Japan) was used. The experiment was conducted in a dark room. The observation distance from the eye to the center of the display was 36.5 cm.

Stimuli

The stimulus consisted of random dots (3 pixels in diameter, 800 dots in total) within an area of 8.95° × 7.46°. In this random dot stimulus, a two-digit number was defined by binocular disparity or luminance. The participant's task was to identify the two-digit number. The numbers were randomly selected from eight types from 0 to 9, excluding “1” and “7” (Figure 1(a)). The reason for excluding “1” and “7” is that these numbers/digits are easier to recognize compared to the others. This resulted in a chance level of 1.56% for a double-digit percentage of correct answers. Each dot was presented for 300 ms.

Figure 1. — (a) The original image of a two-digit number. (b) The dot image defined by luminance difference. (c) The dot image defined by binocular disparity.

Conditions

In the luminance condition, the numbers were presented by a group of dots with 80 cd/m², and the other group dots were displayed using a lower luminance (Figure 1(b)). In the disparity condition, the numbers were presented by a dot group with a crossed disparity, which was determined individually in the preliminary experiment. The other dot groups were displayed with zero disparity with respect to the display surface (Figure 1(c)). The numbers defined by luminance could be recognized, even with a single eye. By contrast, the numbers defined by disparity were detected only with binocular vision. The method for determining the disparity magnitude and the luminance is described in the section below.

The time from the presentation of the first dot to the presentation of the last dot was manipulated. We defined this value as the EOA, in which we tested six conditions: 0, 156.25, 312.5, 625, 1250, and 2500 ms. Figure 2 shows the time course of the number of dots presented for three of the EOA conditions (625, 1250, and 2500 ms). The maximum number of dots presented decreased as the EOA became longer, given that the presentation duration of each dot was constant (300 ms). When the presentation of the stimulus was initiated, the number of displayed dots increased gradually, reaching a specific value, followed by a gradual decrease. The table in Figure 2 shows the maximum number and percentage of dots presented at the same time for each EOA condition, specifically defined as the IMR, respectively, in this research. For example, if the IMR is 50%, this means that in one frame, an instantaneous maximum number of 400 dots was presented, which is up to half of the total of 800 dots.

Figure 2. — Time course of the number of presented dots in three element onset asynchrony (EOA) conditions (625, 1250, and 2500 ms). The horizontal axis is the time from the stimulus onset (ms) and the vertical axis is the number of presented dots.

Procedures

Prior to the main experiment, we conducted a preliminary experiment to compare the rate of correct answers for the stimuli with disparity and luminance for each participant. In the experiment, all of the stimulus elements were presented for 300 ms at the same time. This presentation was the same as the presentation condition of 0 ms lag used in the main experiment. In the preexperiment, the disparity or luminance was manipulated by the simple 1-up/1-down method to find the rate of correct answers of 90%. When the disparity or luminance exceeded the defined range (between −6 and 0 arcmin for disparity and between 0 and 50 cd/m² for luminance) the stimulus remained at the maximum or minimum value. In total, 120 trials were conducted for each of the disparity and luminance stimuli. The results were fitted by the normal cumulative distribution function using the maximum likelihood method, and we calculated the luminance and disparity at which the rate of correct answers was 90% for each participant. These values were used in the main experiment.

In the main experiment, we randomly presented the EOA conditions within a block. Thirty trials were repeated for each EOA condition, and there were 12 blocks, giving a total of 360 trials. The participant fixated on the center point and pressed the keyboard to start the stimulus presentation. After the stimulus presentation, the participant responded by identifying the two numbers presented, using a numerical keyboard. The rate of correct answers for each condition was calculated based on the number of trials in which both digits were correct.

Results

Figure 3 shows the average rate of correct answers as a function of EOA for two conditions of the stimulus. The rate decreased as the EOA increased, with the luminance condition having a larger rate of decrease than the disparity condition. We conducted a two-way analysis of variance (ANOVA) of the stimulus type (luminance/disparity) and EOA that showed a significant effect for the EOA condition, F(5,30) = 87.642, p <.01, η_G₂ = .7060, but no main effect regarding the stimulus condition (luminance/disparity), F(1,6) = 0.006, p = .9408, η_G₂ < .001. The interaction of the factors was significant, F(5,30) = 5.33, p < .01, η_G₂ = .0947. Since the interaction was significant, a test for the simple main effect showed that the simple main effect of stimulus type was significant only for 2500 ms of EOA condition, F(1,6) = 7.75, p < .05, η_G₂ = .1702. The simple main effect of EOA was significant for all stimulus types, luminance: F(5,30) = 69.79, p < .001, η_G₂ = .834; and disparity: F(5,30) = 35.83, p < .001, η_G₂ = 0.5518. Based on these results, we conducted multiple comparisons among EOAs for each stimulus using Ryan's method. Under both disparity and luminance conditions, the rate of correct answers decreased as the EOA increased, but the effect of the EOA was smaller for the disparity stimulus compared to the luminance stimulus. Multiple comparisons showed a significant difference among all EOA values, except for the three shortest conditions (0, 156.25, and 312.5 ms) for the luminance stimulus (p < .05). However, in the disparity stimulus, there were significant differences except for the four shortest conditions (0, 156.25, 312.5, and 625 ms; p < .05).

Discussion

In Experiment 1, the stimulus image defined by luminance and disparity was divided and presented with EOA as a variable, and the recognition rate of the number contained in the stimulus was obtained to verify the temporal integration characteristics of both disparity and luminance. The effect of the EOA was evident under both conditions, and it became difficult to integrate the information when the elements were presented over a long period. In addition, we identified some interaction between the EOA and the stimulus type (luminance/disparity). In the disparity condition, the rate of correct answers decreased more gradually than in the luminance condition as the EOA increased. This suggests that the binocular disparity is superior to luminance in the ability to integrate and restore sparsely presented information with EOA.

We assumed that the IMR would be important with respect to image perception. In previous studies in which a drawn image was divided and presented spatiotemporally as in the present experiment, all elements were manipulated to be presented for an equal duration (Ikeda & Uchikawa, 1978; Unuma, 1992). In Experiment 1 of the current study, all of the divided elements were displayed for the same duration, as in the previous studies. In this case, the trend in the results may be due to the IMR, as opposed to the EOA, among elements. In addition, it is possible that the rate of correct answers was high because the IMR was large, as opposed to a short EOA. In Experiments 2 and 3, we kept the IMR constant and investigated the temporal integration characteristics of luminance and disparity for image perception.

Experiment 2: Effect of EOA and Presentation Order on Temporal Integration of Shape Recognition Under Constant IMR

In Experiment 1, the temporal integration was compared using stimuli defined by luminance and binocular disparity. However, the IMR changed for each EOA condition in Experiment 1. In Experiment 2, we conducted the same experiment as in Experiment 1, with the IMR being constant among the conditions. If the property of shape recognition were to vary with EOA, then the correct response rate would decrease as EOA increases by lower-order temporal integration. On the other hand, if IMR is more important for shape recognition, then the correct response rate would not change with EOA but would change with IMR conditions.

In addition, to investigate the effect of motion information on the integration of disparity and luminance integration in shape recognition, we added the condition of slit vision, in which the visible area moved gradually from the edge of the target to the opposite side. If the correct response rate increases with this slit viewing condition, we can assume the global motion information contributes to shape recognition by spatiotemporal integration along the motion trajectory of the slit, indicating that higher-order spatiotemporal integration occurs for visual object recognition.