Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Jun 1.
Published in final edited form as: J Math Biol. 2011 Jun 22;64(7):1157–1188. doi: 10.1007/s00285-011-0445-1

Visual Depth from Motion Parallax and Eye Pursuit

Keith Stroyan 1, Mark Nawrot 2
PMCID: PMC3348271  NIHMSID: NIHMS358745  PMID: 21695531

Abstract

A translating observer viewing a rigid environment experiences “motion parallax,” the relative movement upon the observer’s retina of variously positioned objects in the scene. This retinal movement of images provides a cue to the relative depth of objects in the environment, however retinal motion alone cannot mathematically determine relative depth of the objects. Visual perception of depth from lateral observer translation uses both retinal image motion and eye movement. In (Nawrot & Stroyan, 2009, Vision Res. 49, p.1969) we showed mathematically that the ratio of the rate of retinal motion over the rate of smooth eye pursuit mathematically determines depth relative to the fixation point in central vision. We also reported on psychophysical experiments indicating that this ratio is the important quantity for perception. Here we analyze the motion/pursuit cue for the more general, and more complicated, case when objects are distributed across the horizontal viewing plane beyond central vision. We show how the mathematical motion/pursuit cue varies with different points across the plane and with time as an observer translates. If the time varying retinal motion and smooth eye pursuit are the only signals used for this visual process, it is important to know what is mathematically possible to derive about depth and structure. Our analysis shows that the motion/pursuit ratio determines an excellent description of depth and structure in these broader stimulus conditions, provides a detailed quantitative hypothesis of these visual processes for the perception of depth and structure from motion parallax, and provides a computational foundation to analyze the dynamic geometry of future experiments.

Introduction

Observer translation while viewing a rigid scene creates a continuously varying retinal image because of the change in relative position of objects from the observer’s point of view. This motion parallax is an important monocular cue for the visual perception of depth. However, we have only recently begun to understand the dynamic geometry in relation to the neural mechanisms serving the perception of depth. The subject is complicated by the fact that there are different “dynamic geometries” of retinal motion depending on the direction of observer motion and movement of the head and eyes. Helmholtz (1910, vol III, p.295) wrote about retinal motion and concentrated on the case of forward motion, a topic since studied in detail generally under the name “optic flow.” In the extreme case when one moves directly toward the fixate point the “flow” on the retina is pure expansion. (In more general cases with a component of forward motion, the “focus of expansion” is an important consideration. Lateral motion has no FoE at the crossing point.) However, von Kreis, in (Helmholtz 1910, vol III, Note 4, p.371) writes, “… The changes of which he (Helmholtz) speaks are such as the observer would notice if he advanced forward without changing the attitude of his head or his eyes especially. In reality the phenomena are complicated by the fact that, supposing our attention is attracted, not by some object moving along with us, but by stationary external objects, we are invariably in the habit of keeping the eyes fastened for a brief space on some definite point, by turning them so as to counteract the effect of the forward motion of the body. …” von Kries concludes Note 4 with, “Now these apparent motions are just as useful as those described by Helmholtz for forming estimates of distance; and the probability is that both of them generally contribute to the result in some way, although it would be hard to say exactly how.” Our work is aimed at understanding the contribution of motion parallax to depth perception when one keeps “the eyes fastened for a brief space on some definite point.” More specifically, we concentrate on the case of lateral motion (on a translation axis perpendicular to the “fixate axis” described below), we assume the head does not rotate and the eyes maintain fixation by smooth eye pursuit. Our mathematics is intended to help design and analyze experiments that can begin to understand the different contributions of these two kinds of motion parallax to depth perception.

One can mathematically calculate the relative position of objects based on motion parallax if the direction and magnitude of the observer’s translation are known. Formulas like Equation (1) below for relative depth that rely on observer translation velocity have appeared in articles specifically such as (Nakayama, Loomis, 1974), (Longuet-Higgins, Prazdny, 1980), and in others such as (Gordon, 1965), (Koenderink, van Doorn, 1976, 1987), (Perrone, Stone 1994), (Fermüller, Aloimonos, 1997), and (Hanes, et al, 2008). If one adds measurement of pursuit to the approach of Nakayama and Loomis (1974) or Longuet-Higgins and Prazdny (1980) it is easy to derive Equation (1). The new feature of Equation (1) and of this article in its more general 2D setting is the observation that the ratio (dθ/dα) of the rate of retinal motion over the rate of smooth pursuit is a mathematical quantity that could be used to estimate relative depth. There is both psychophysical and neurological evidence to suggest that perception depends on this ratio in the case of lateral motion. While observer translation speed can be used mathematically, it appears that the visual processes rely on a compensatory pursuit eye movement, generated during the observer translation, as a proxy for information about the direction and magnitude of observer translation (Nawrot & Joyce, 2006). That is, as the observer translates, the visual system maintains a stable point of fixation by rotating the eyes in the direction opposite the translation (Miles & Busettini, 1992; Miles, 1993; Miles, 1998). The visual system uses the smooth-pursuit eye movement component of this compensatory eye movement signal for the perception of unambiguous depth-sign from motion parallax (Nawrot, 2003; Nawrot & Joyce, 2006; Naji & Freeman, 2004; Nawrot & Stroyan, 2009; Nadler, Nawrot, Angelaki, DeAngelis, 2009). It is this dynamic geometry involving observer translation over time, a pursuit eye movement signal, and resulting retinal image motion, that we seek to understand.

In our previous work (Nawrot & Stroyan, 2009) we proposed a simple mathematical formula relating the perception of relative depth to the ratio of retinal image motion and pursuit eye movement. (The depth is relative to the fixation point described below.) Here we mathematically describe that situation. A laterally translating observer viewing a rigid environment (with no head rotation) fixates on one point F (the “fixate”) and must judge the relative position of another point D (called the “distractor” to be specific). Later in this work, the point D can be any point in the half-plane of the fixate point and translation axis in front of the eyes, but in (Nawrot & Stroyan, 2009), D is restricted to the positive “fixate axis” perpendicular to the translation axis (for distractors in central vision as the eye crosses the fixate axis). Translation with fixation produces two important changes in the two angles (α and θ) illustrated in Figure (1). Under lateral observer translation, both of these angles are continuously changing. The “tracking” angle α between the line to the fixate and the naso-occipito axis measures the rotation of the eye needed to maintain the point F on the fovea. (This is fixation, meaning the retinal position of F does not change.) Pursuit eye movement or “pursuit” corresponds to the change in α or derivative dα/dt. As the observer translates and maintains fixation on F, the image of D falls on the retina in a position determined by the “separation” angle θ between the line from the eye to the fixate and the line from the eye to the distractor. Retinal image motion corresponds to change in angle θ or the “motion” derivative dθ/dt.

Figure 1.

Figure 1

The 1D case, D on the fixate-axis (d2 =0) with tracking angle, α[t], separation angle, θ[t]

Using only simple geometry, Nawrot and Stroyan (2009) demonstrated that the relative depth of a distractor on the fixate axis of Figure (1) is given by Equation (1), the 1D “motion/pursuit law”:

df=dθdα11dθdα (1)

with fixate at distance f and distractor d farther along the fixate-axis and all derivatives evaluated at t = 0 when the observer crosses the central intersection of the (earth-fixed) axes of Figure (1) and has D in central vision. Equation (1) only describes depth in central vision and observation when t = 0. (We use the differential notation for the ratio of derivatives dθ/dα = (dθ/dt)/(dα/dt), etc.) This provides a very useful model in one distractor dimension of central vision. However, in our previous work the empirical relation between the motion/pursuit law and relative depth was addressed only for a small set of conditions including lateral motion, points aligned in central vision, and a limited range of motion/pursuit ratios. While the motion/pursuit law gives the precise relative depth in those limited conditions, the generalizability of the motion/pursuit law to a broad range of points in the horizontal plane or a range of ratios and fixate distances remains to be determined. Therefore, the goal of the current work is to expand our understanding of how the perception of depth from motion parallax might function within this expanded range of parameters reflecting a broader range of viewing conditions with time-varying observation, and demonstrate that the motion/pursuit law could be a general processing strategy.

The Section 1 gives the general 2-dimensional mathematical model of motion parallax for lateral translation with fixation. We use two coordinate systems, each useful for different computations and give the basic derivative computations in both systems. Points on the line from the eye through the fixate have an exact relative depth generalizing Equation (1) above as Equation (15) below.

The Section 2 describes the variation of the motion/pursuit ratio across the plane determined by the eyes and fixate (usually the horizontal plane) at a fixed time. Specifically, we show that this ratio is invariant on circles described in that section. This invariance is “bad” in a sense, but binocular disparity has a similar “difficulty.” For example, binocular disparity is zero for all points on the circle through the fixate and both eye nodes even though different points on this “Vieth-Müller circle” (or horopter) have different non zero depths. The invariant circles show that the instantaneous depth estimates of the motion/pursuit formula are less accurate as the distractor is farther from central vision. On the other hand, the invariant circles together with the in-line case give a simple geometric method with which to analyze the otherwise complicated formulas.

Unlike static stereopsis, the motion/pursuit ratio of a translating observer changes with time so that the motion/pursuit formula predicts a different relative depth at different times during the translation. The Section 3 compares the time-varying motion/pursuit formula with two measures of depth: relative distance beyond the fixate and relative distance from the fixate. Roughly speaking, these are the egocentric and geocentric (or allocentric) measures of depth. The motion/pursuit formula only provides an accurate estimation of relative depth for some points in the plane at a specific time. However, these estimations of relative depth improve if the motion/pursuit law operates over an interval of time and continues to update the estimates of depth. People can perceive the motion/pursuit cue in less than 150 msec, “instantaneously,” corresponging to the formula at a single t (Nawrot & Stroyan, 2010), but maintain fixation much longer. If visual depth perception integrates the motion/pursuit cue for these longer intervals, our analysis shows how accurately the mathematics could predict each measure of depth.

The Section 4 studies the geocentric “structural” relative depth in more detail. Specifically we give a formula for the time that the motion/pursuit cue achieves its maximum and show that the peak-time motion/pursuit cue could also be used to recover “structure-from-motion” for a large portion of the plane. This might correspond to generating your “best estimate” of the position of a point as you move and then combining them into a shape. (We are aware of other peaks in perceptual cues, such as the brightness of a rotating lighthouse.) It is unknown whether the human perception of relative depth or “structure-from-motion” makes use of these novel position-varying maximal cues received during observer translation, but our analysis suggests that the peak motion/pursuit cue is a plausible source of valuable visual information. Our analysis shows just how much could be mathematically possible in re-constructing rigid geometry in the brain from peak cues received during translation. Neither the peak-time observation of section 4 nor the “integrated” view of section 3 have been studied empirically and they are not mutually exclusive.

Flat retina (or eye-centered homogeneous or projective) coordinates are used extensively in the vision literature (Longuet-Higgins & Prazdny, 1980; Fermüller & Aloimonos, 1997). The Section 5 shows how to get “flat depth” from the motion/pursuit ratio expressed in those coordinates. A complication of the flat retina depth formula is that perception of the direction of observer translation is used in addition to retinal position, retinal motion, and rate of pursuit. Partly for this reason, we have not analyzed it in as much detail as our angular approach where we are focused mainly on what information the brain could derive mathematically from just the motion/pursuit ratio. Flat depth has a different invariance structure that could possibly be used to decide if it is used empirically. The section also gives a comparison between the flat retina approach and the angular approach.

Our analysis will help us to better understand the dynamic geometric information available for the perception of depth from motion parallax and will also aid subsequent quantitative analyses of human psychophysical experiments. For example, while the perception of depth from motion parallax is not veridical, the motion/pursuit cue also differs from true geometrical measures of depth at different times. The ability to compare the motion/pursuit formula to true structural depth and to actual visual depth perception, especially in conditions when the formula predicts a deviation from veridical, will certainly help us to better understand the neural mechanisms serving the perception of depth from motion parallax.

Section 1: Coordinates for the Fixation Plane and Angles for Tracking and Separation

This section gives the general 2-dimensional mathematical model of motion parallax for lateral translation of an observer who is fixated on a particular point. We use two coordinate systems, each useful for different computations and give the basic derivative computations in both systems.

We begin with rigid (earth centered) 2D cartesian coordinate axes for the horizontal viewing plane, one represented left to right on the page, the other up and down in Figure (2).

Figure 2.

Figure 2

Coordinate axes for the horizontal plane

The fixation point is the 2D vector with coordinates F = {0, f} on the fixate axis. Distractor points in the horizontal plane are also represented as 2D vectors in this coordinate system, but we specifically represent one specific distractor by D = {d2, f + d1}, so that the vector difference DF = {d2, d1} gives the location of the distractor relative to the fixate as shown in Figure (2). Denoting the position of D relative to F is important because this represents the relative depth generated by the motion parallax cue. These could be considered fixate-centered or “allocentric” coordinates for D. When d2 = 0, the distractor lies on the fixate axis and the simple Equation (1) gives relative depth of D exactly. The variation of the geometric cues is more complicated when d2 ≠ 0 and this section derives the basic geometric formulas (7 - 10) below. Later sections analyze spacial and temporal aspects of the formulas as perceptual cues.

As illustrated in Figure (2), the observer translates along the left-to-right “translation axis” with the right eye passing the central intersection point at t = 0. Only the observer translates; the fixate F, distractor D, and the axes measuring the location of D and F are assumed to remain rigid and earth fixed. Also, the observer’s head does not rotate; the naso-occipito axis remains perpendicular to the direction of translation, and parallel to the fixate axis, and the eye rotates to “pursue” the fixate or maintain stable fixation of F on the fovea. When the eye is in the central position (t = 0), the naso-occipito axis coincides with our rigid up-and-down “fixate axis.”

As illustrated in Figure (3), the vertical reference line next to α in points in the direction of the naso-occipito axis, so that the angle α from the vertical to the line from the eye to F measures the angle that the eye must turn in order to maintain “fixation” from its position at time t, that is, the angle α keeps the image of F on the fovea of the eye. We call α the tracking angle and the time rate of change dα/dt the “pursuit.” The angle θ separating the lines from the fixate F and distractor D changes as the observer translates laterally causing the image of D to move on the retina. (The image of F does not move on the retina because fixation maintains its position on the fovea.) We measure this angle positive if it is counterclockwise from the D line to the F line and negative for clockwise. The time rate of change dθ/dt is a measure of the retinal image motion of D, so dθ/dt is called “motion.”

Figure 3.

Figure 3

The right eye tracking angle, α[t], and separation angle, θ[t] shown during rightward observer translation

It is important to introduce two mathematical simplifications as we move from the simple 1D case (d2 = 0) to the more involved 2D case (d2 ≠ 0). First, the angle β = αθ is a mathematical convenience that simplifies Equation (1) for the 1D case above to its equivalent form in Equation (2).

df=dθdβatt=0,ifd=d1andd2=0. (2)

The angle β does not correspond to an obvious construct the way that angle α corresponds to eye position, and angle θ corresponds to the retinal image position of point D. However, using the angle β makes the mathematical derivations more manageable. Moreover, the angle β is always used in a ratio with angle θ so that the separate effects of θ and α are not confounded. That is, the ratio dθ/dβ = dθ/dα(1/ (1 − dθ/dα)) is determined by the value of the motion/pursuit ratio, dθ/dα. Similarly, dθ/dα = dθ/dβ/(1 + dθ/dβ), either ratio determines the other. Therefore, this analysis will rely on β to help show how the 2D time-varying motion/pursuit formula dθ/dβ behaves when distractors lie in the horizontal plane (the 2D case).

The second mathematical construct is to describe the scene in “egocentric coordinates” or, more precisely, “fovea-centered” polar coordinates shown in Figure (4). (Eye-centered projective “flat retina” coordinates are considered in a section below for comparison with that portion of the vision literature.) The coordinates of a point D from the eye fixated on F are {ρ, θ}, where ρ is the distance from the center of the eye to the point D and θ is the separation angle between the line to the fixate and the line to D. These are functions of D and t, so if needed they are denoted ρ[D, t] or ρD, or ρF, etc.

Figure 4.

Figure 4

Eye-centered coordinates, ρ, with separation angle, θ, α, and β=αθ

In the rigid (fixate-axis/translation-axis) cartesian coordinate system, the eye (E) of an observer translating at speed “s” has coordinates E[t] = {s t, 0}, so the angles α and β satisfy the relations

Tan[α]=stf&Tan[β]=std2d1+f (3)

Differentiating Equation (3) with respect to time, we obtain Equation (4).

1Cos[α]2dαdt=sf&1Cos[β]2dβdt=sd1+f (4)

Use f = ρF Cos[α] and d1 + f = ρD Cos[β] to obtain Equation (5).

dαdt=sρFCos[α]&dβdt=sρDCos[β] (5)

We know β = αθ, so θ = αβ and dθ/dt = dα/dtdβ/dt, giving Equation (6).

dθdt=sρFCos[α]sρDCos[β]=sCos[α]ρF(1ρFρDCos[β]Cos[α]) (6)

Finally, ratios give Equations 7 - 10, the main formulas that give “motion”, “pursuit”, and the important ratios dθ/dβ and dθ/dα. These derivatives can be expressed in cartesian coordinates by substituting expressions like ρF=f2+s2t2 and f = ρF Cos[α] to yield the equivalent cartesian formulas in Equations 7-10.

dαdt=sρFCos[α]=sff2+s2t2 (7)
dθdt=sCos[α]ρF(1ρFρDCos[β]Cos[α])=sf2+s2t2(f(d1+f)(f2+s2t2)(d1+f)2+(d2st)2) (8)
dθdβ=ρDCos[α]Cos[β]ρFρF=f(d1(d1+f)+d222d2st)d1s2t2(d1+f)(f2+s2t2) (9)
dθdα=1Cos[β]ρFCos[α]ρD=1f(f(d22+d1(d1+f)2d2st)d2s2t2((d1+f)2+(d1st)2)) (10)

The different expressions are useful for different purposes. When t = 0 the cartesian formulas simplify to:

dαdt[0]=sf (11)
dθdt[0]=(d12+d22+d1f)s((d1+f)2+d22)f (12)
dθdβ[0]=d1(d1+f)+d22f(d1+f) (13)
dθdα[0]=d1(f+d1)+d22(f+d1)2+d22 (14)

If F and D are in line with the eye at a particular time so θ = 0, or α = β, then the polar formulas simplify to:

dθdβ=ρDρFρF (15)
dθdα=ρDρFρD (16)

which expresses the relative depth of the two points F and D in terms of a ratio of the rates of change of the angles, dθ/dβ and and dθ/dα. At the instant t when the fixate and distractor are in line, θ = 0, dθ/dβ exactly equals the depth of the distractor from the eye relative to the (changing) distance to the fixate, generalizing Equations (1) and (2) to cases where d2 ≠ 0.

Section 2: Invariant Circles for 2D Distractors

This section shows how the motion/pursuit formula dθ/dβ[t] varies across the (half) plane in front of the eye at a fixed time t. Ideally, a visual cue to the relative depth of an object would vary only with distance from the fixation point. However, both binocular disparity and motion parallax are constant on (different) circles containing points of varying depth. For binocular disparity, the horopter connects points of zero retinal disparity, but the horopter does not maintain a fixed distance from the observer. The motion/pursuit ratio has a similar geometry at a fixed time t.

To explain the case for motion parallax, consider, as detailed earlier, at time t and translation speed s, the eye is located at E[t] = {s t, 0} on the translation axis. At a positive time the eye is to the right of the central point, {0, 0}. Figure (5) shows the plane in front of the eye with the eye position and the fixate F. The motion/pursuit law gives the exact relative depth of a distractor D on the line from the eye through the fixate using the formula dθ/dβ = dθ/dα(1/(1 − dθ/dα)) in Equation (15). Again, it is important to note that dθ/dt and dα/dt are perceptual cues, while dβ/dt = dα/dtdθ/dt is a simplifying mathematical convenience and does not correspond to a distinct perceptual cue.

Figure 5.

Figure 5

Circle for the two dimensional motion/pursuit ratio at a fixed t>0 Eye at E ={40., 0}, F ={0, 100.}, D ={−25., 100+61.}, D’ ={125., 100+32.} ∣DE∣ = ρD = 173. and ∣D’ − E∣ = ρD’ = 157.

A circle tangent to the translation axis at E through the in-line distractor D is also shown in Figure (5). The value of dθ/dβ[t] at a fixed time t is constant on this whole circle so all the distractor points like D’ on this circle produce the same motion/pursuit ratio cue as the in-line distractor D, dθ/d β[D, t] = dθ/d β[D’, t]. The point D’ is not on the line through the eye and fixate as shown in Figure (5). Notice that there are points on the invariant circle both farther from the eye than D and closer to the eye. In the example illustrated in Figure (5), the distance from the eye to D is 173 cm and the distance from the eye to D’ is 157 cm even though the value of d θ/d β is the same for both points at the particular time and eye position shown. The following sections will derive this invariant circle result and then show how time-varying observation can still give an accurate perception of depth.

The retinal motion derivative dθ/dt for a fixate F = {0, f} and 2D distractor D = {d2, d1 + f} measured from the eye node, is given by Equation (8). We can use the cartesian formula to see that, at a fixed time t, retinal motion has the same value for all distractors on a circle that has diameter on the naso-occipito axis and contains the eye and D. Constant retinal motion, dθ/dt = c, constant, or

dθdt=sf2+s2t2(f(d1+f)(f2+s2t2)(d1+f)2+(d2st)2)=c (17)

is equivalent to D lying on the circle with radius r and center {st, r}:

((d1+f)r)2+(std2)2=r2where12r=ff2+s2t2cs (18)

(Note: An equation of the circle in the x-y-plane with center {a, b} and radius r is (yb)2 + (ax)2 = r2. To see that Equation (18) is a circle centered at {st, r}, replace x with d2, y with d1 + f, b with r, a with s t. The radius is given by solving the equation on the right of (18) using the constant value c of dθ/dt.)

For example, when dθdt[0]=c for the fixate {0, f} and distractor {0, f + d}, the diameter is from {0, 0} to {0, f + d} with radius f+d2 and center {0,f+d2}. These circles are shown in the surface and contour plots of dθ/dt in Figure (6).

Figure 6.

Figure 6

Invariant circles for the two dimensional retinal motion rate, Left: t = 0 Right: t>0.

The vertical dimension on the three-dimensional plots show different values of retinal image motion (dθ/dt) which are collapsed and shown as lines on the two-dimensional plots.

However, retinal motion (Figure 6) is only part of the requisite information. It is important to consider the combination of retinal motion and pursuit information. Since the angle α (eye-fixate tracking angle) depends only on the position of the eye relative to the fixate, and not on any distractor, dα/dt is constant on the dθ/dt-constant circles at a fixed time. Also, since β = αθ, it follows that dβ/dt = dα/dtdθ/dt and thus dβ/dt does not vary on a circle where dθ/dt is constant at that time. Therefore the motion/pursuit formulas dθ/dα and dθ/dβ are constant on these constant retinal motion circles at a fixed time. The graph (Figure 7) of dθ/dβ is a simple (slanted) cone with apex at the eye at {s t, 0, −1} and containing the line to the fixate at {0, f, 0}. This line (Figure 7) is the special case of dθ/dβ when fixate and distractor are in line with the eye, θ = 0, in Equation (15). In effect, this cone could be used to compute dθ/dβ at a particular time by computing the in-line case and moving around the invariant circle to other distractors, rather than using Equation (9). One could also compute the motion/pursuit ratio, dθ/d α[t] = dθ/dβ[t]/(1 + dθ/dβ[t]), this way using dθ/dβ or Equation (16) and the motion-invariant circles. Indeed, the simple motion/pursuit ratio dθ/dα may turn out to be important in understanding the calculations performed by the underlying neural mechanisms, but we suppress those calculations here because they are harder to relate to the rigid geometry (see Nawrot, Stroyan (2009) Equation (4) and Figure 7).

Figure 7.

Figure 7

Invariant circles for the motion/pursuit formula, Left: t = 0 Right: t>0

Since formulas (7)-(10) are rather complicated, the invariant circles of retinal motion are helpful in analyzing and visualizing the space variation of the motion/pursuit cues. (These circles vary with time since they are always tangent to the translation axis at the eye, so the time variation of the motion/pursuit cue is also important as shown below.) Geometrically, binocular disparity is invariant on circles through the eye nodes and a distractor (see [SN.2008.8]). These circles approximate the invariant circle for retinal motion (Stroyan, 2010). While there are measures of the empirical horopter for stereopsis (e.g, Hillis & Banks, 2001), we know of no measurement of the corresponding empirical invariant circle for motion parallax.

The circles derived in this section are similar to ones derived in (Hanes, et al., 2008) and (Wang and Cutting, 1999). Wang and Cutting derived circles to distinguish which visual points will converge or diverge from the focal point during translation of the eye. Hanes et al. derived circles to distinguish which points would move left or right on the retina, and which would converge towards the eye more quickly than the focal point. Both studies have the same premise, of focus on one point (or tracking) during translation of the eye. The results presented here are a complement to their work with another “circle analysis.”

Section 3: The Motion/Pursuit Formula and Measures of Depth

The motion/pursuit formula suggests that proximal visual cues, retinal image motion and pursuit eye movement signal, could be used to generate an internal representation of the position of objects in a visual scene and our experiments in Nawrot & Stroyan (2009) indicate that perception depends only on the motion/pursuit ratio. This section describes how the time-varying motion/pursuit Formula (9) is related to two different types of relative depth. In the case where the distractor is on the fixate axis, the two types of relative depth coincide and equal the time zero motion/pursuit formula, d / f = dθ/dβ[0], Equation (1) or (15). For distractors in the horizontal plane with d2 ≠ 0, the time-varying motion/pursuit formula varies roughly between the (egocentric) eye-centered relative depth and the (geocentric or allocentric) rigid geometric relative structural distance (both described mathematically below). The following subsections give the two depth measures and some basic comparisons between those different versions of relative depth and Formula (9). When we perceive that a distractor is beyond a fixate, but off to one side, it is not clear which sort of depth is represented in the brain (or if perception is veridical).

Egocentric depth

A generalization of the relative depth d / f for one dimension is given in Equation (19) as either vector norms (left), as eye-centered radial distances (center), and in cartesian coordinates (right), where E[t] = {s t, 0} is the location of the eye node at time t. The formula in Equation (19) includes a sign with (−) for near, (+) for far.

((distance from the eye to the distractor)(distance from the eye to the fixate))(distance from the eye to the fixate)=DE[t]FE[t]FE[t]=ρDρFρF=(d2st)2+(d1+f)2(st)2+f2(st)2+f2 (19)

We will call this the time varying relative eye-centered depth or “relative depth” for short. The depth Formula (19) is negative for near points and positive for far ones. (We also discuss another relative depth in the section below on fovea-centered flat retina coordinates, see Formula (31).) Graphs of relative depth are shown in Figure (8) for t = 0 and for t = 1/2 at a translation speed 80cm/sec. The fixate point is labeled F and is at F = {0, 100} for this example. The contour graphs show the circles of constant relative depth centered at the eye. Notice that the surface graph for relative depth is a right half-cone with apex at the eye. The base plane represents points in the horizontal plane and the height is the dimensionless relative depth ratio.

Figure 8.

Figure 8

LEFT: Eye-centered relative depth of distractors in the horizontal plane at t = 0 RIGHT: Eye-centered relative depth at t =1 / 2

For clarity, we give some numerical examples corresponding to the graphs. The two distractors marked on the graphs are D1 = {0, 150} and D2 = {−20, 150}. These are at exaggerated distances so that they can be more easily seen on the graphs and are chosen so they line up with the fixate at the two times chosen. The relative depths at the times t = 0 and t = 1/2 are as follows:

ρD1[0]=D1E[0]{0,150}{0,0}={0,150}=02+1502=150ρD1[12]=D1E[12]={0,150}{40,0}={40,150}=402+1502=10241155.2

When the observer is farther from the fixate point, values of eye-centered relative depth are lowered by the longer distance from the eye to the fixate. In this example, in 1/2 second at 80 cm/sec, the eye moves to E[1/2] = {40, 0}, so that the graphs on the right have an eye to fixate distance of

ρF[12]=FE[12]={0,100}{40,0}={40,100}=(40)2+1002=2029107.7

This makes the eye-centered relative depths for D1 (at t = 0 and t = 1/2 respectively):

D1E[0]FE[0]FE[0]=1501001000.5D1E[12]FE[12]FE[12]=10241202920290.441

The eye-centered distances for D2 are:

ρD2[0]=D2E[0]{20,150}{0,0}={20,150}=202+1502=10229151.3ρD2[12]=D2E[12]={20,150}{40,0}={60,150}=602+1502=3029161.6

This makes the eye-centered relative depths for D2:

D2E[0]FE[0]FE[0]=102291001000.513D2E[12]FE[12]FE[12]=3029202920290.50

By Equation (15), all distractors that lie on the line θ[D, tD] = 0 and θ[D, tD] = 2 α[tD], where α[t] is the tracking angle, will satisfy Equation (20). This means that under these conditions, the motion/pursuit formula generates a precise estimate of depth for distractors at these two points.

dθdβ[tD]=relative eye centered depth at timetD=ρDρFρF (20)

Distractors between these angles satisfy dθdβ[tD]<ρDρFρF meaning that the motion/pursuit formula generates an underestimate of relative eye centered depth. However, distractors outside these angles satisfy the opposite, dθdβ[tD]>ρDρFρF, meaning that the motion/pursuit formula generates an over estimate of depth. This inequality is illustrated in Figure (9) for a numerical example using D1 at t = 0 and D2 at t = 1/2 (speed 80 cm/sec). (It is currently unknown empirically whether the perception of depth from motion parallax actually varies over the horizontal plane as described by the motion/pursuit formula.)

Figure 9.

Figure 9

Comparisons between the motion/pursuit formula and eye-centered relative depth

A “circle analysis” gives another way to illustrate how the motion/pursuit formula generates time-varying estimates of relative depth when d2 ≠ 0. First, Figure (10) illustrates equidistant curves (left) (similar to the bottom pannels of Figure 9) and curves showing equal value for the motion/pursuit formula (right)( similar to Figure 5 above). Both kinds of equi-value curves are circles, but the eye is at the center of the equidistant curves and at the bottom of the equi-motion/pursuit formula curves on the right.

Figure 10.

Figure 10

Left: Equidistant curves connecting points that are the same distance from the observer.

Right: Curves connecting points generating the same retinal image motion and the same dθ/dβ.

The point D1, which has d2 = 0, has a relative depth of 0.25 (e.g., d/f = 0.25) and dθ/dβ = 0.25 at t = 0. To illustrate “circle analysis” we chose the point D2, which has d2 ≠ 0, is on the circle of relative depth 0, but is also on the circle of constant dθ/dβ[0] = 0.25 (Figure 12). Since the motion/pursuit formula varies with time, a ±150 msec observation with fixate at 57 cm at translation speed 9.948 cm/sec (10 °/sec) produces an estimate of depth that is the same for both D1 and D2 as they lie on the same equi-motion/pursuit curve as shown in Figure (11). However, it should be clear from the figure that D1 and D2 also lie on different equidistance curves. To complete the numerical illustration, the precise values at D2 are: dθ/dβ[−0.15] = 0.2464, dθ/dβ[0] = 0.25, dθ/dβ[+0.15] = 0.2542.

Figure 12.

Figure 12

The moving “motion/pursuit horopter” for ±1sec.

Figure 11.

Figure 11

Equidistant curves at time zero and curves with the same dθ/dβ[t] for t = −0.15, 0, +0.15 sec

However, the observation time in this example is a brief 150 msec, and the equi-motion/pursuit curves change as the observation duration is extended. If the points in Figures 10 and 11 are observed for ±1.0 sec, the motion/pursuit-constant circles are shown in Figure (12). For the longer observation time interval, the values at D2 are: dθ/dβ[−1.0] = 0.001, dθ/dβ[0] = 0.25, dθ/dβ[+1] = 0.498. (See Figure 13 below.) However, again, it is currently unknown whether these time-varying cues of the motion/pursuit formula are used for visual perception.

Figure 13.

Figure 13

Region of distractors where dθ/dβ is within 10% (dark) of the relative depth, 20% (light) at times shown or cumulative, 0<t0<0.5

Figure (13) shows the hypothetical distractor positions for which the motion/pursuit formula produces a result that is within 10% (dark shading) and 20% (light shading) of the actual relative depth at various times. The left panel shows t = 0, there the dark line represents the fixate axis from Figure (1). For distractors falling within the dark region around this line, at that instant the motion/pursuit formula produces an estimate of depth that is within 10% of the actual value. This dark area corresponds roughly to the central 25 degrees of the visual field. As the observer translates rightward (panels 2 and 3) these zones of distractors receiving accurate depth estimates changes, following the relationships outlined in Figure (9). However, the cumulative frame on the right side of Figure (13) shows all the points that would have been cued at a dθ/dβ[t]-value within 10% of relative depth sometime during the interval 0 ≤ t ≤ 0.5. This frame of Figure (13) means that in that half second interval the observer receives accurate cues to relative depth in a large portion of the horizontal plane “in front”, “behind”, and “to the side”, while any single time cue has gaps. In Section 4 below we show that dθ/dβ has a maximum value for each distractor and the maximum cue is an accurate indicator of depth. It is possible that some neural centers may integrate ongoing signals in a way that gives the maximum.

Structural distance (or “geocentric” or “allocentric” distance)

Another important measure of difference between fixate and distractor is the rigid geometric distance between F and D or “structural distance” (or fixate-allocentric distance). In order to compare the dimensionless monocular cue dθ/dβ[t] to this measure, we measure this distance relative to the closest distance that the eye comes from F and call this quantity the relative structural distance or “relative distance” given by Equation (21).

DFFE[0]=DFF=d12+d22f (21)

This relative distance depends only on the rigid geometric location of the two points in space (the fixate and distractor) and does not vary with time. The surface and contour graphs of equivalent relative structural distance are shown in Figure (14) with the sign convention that “near” points with d1 < 0 are counted negative, and “far” points with d1 > 0 positive. (A complete analysis of “near” and “far” is a little more complicated, but this gives the fundamental idea.) Therefore these surface graphs for near and far equivalent structural distance each depict half-cones with apex at the fixate, which has zero relative distance because it is the reference point in this analysis. The geometric distances from the fixate for the distractors shown in Figure (14) are:

D1F={0,150}{0,100}=50D2F={20,150}{0,100}=202+502=102953.9

The relative distances of these distractors, from the eye-centered observer’s point of view at {0, 0} are:

D1FF={0,150}{0,100}100=120.50D2FF={20,150}{0,100}100=202+502100=29100.539

Compare these with the relative depths given above of 0.50 and 0.513 for the same distractors.

Figure 14.

Figure 14

Relative (rigid) structural distance.

Figure (15) shows the hypothetical distractor positions for which the motion/pursuit formula produces a result that is within 10% (dark shading) and 20% (light shading) of the actual relative structural distance at various times. The left panel shows t = 0, there the dark line represents the fixate axis from Figure (1). For distractors falling within the dark region around this line, at that instant the motion/pursuit formula produces an estimate of depth that is within 10% of the actual structural distance value. As the observer translates rightward (panels 2 and 3) these zones of distractors receiving accurate distance estimates changes. The cumulative frame on the right side of Figure (15) shows all the points that would have been cued at a dθ/dβ[t]-value within 10% of relative distance sometime during the interval 0 ≤ t ≤ 0.5. Compare Figure (15) showing integrated relative distance inputs to Figure (13) for integrated relative depth inputs. These accuracy comparisons are not intended to be a claim about accuracy of observer perception, but rather only to compare the geometric quantity and the mathematical motion/pursuit cue. Human depth perception is not veridical, so a geometric cue within 10% of one of these measures may be “good” for perception of the measure. In future experiments it will be important to compare: the geometry, the mathematical motion/pursuit cues, and actual perception.

Figure 15.

Figure 15

Region of distractors where dθ/dβ is within 10% (dark) of the relative structural distance, 20% (light) at times shown or cumulative, 0<t <0.5

Comparisons of the Motion/Pursuit formula with egocentric and geocentric depth

Graphs of dθ/dβ and the two relative geometric measures are shown for t = 0 and t = 1/2 in Figure (16). Numerical comparisons are in Table 1. The graph of dθ/dβ[t] is shown dark and the graphs of the geometrical comparisions light. All these graphs are conical, but with different apexes. They all coincide above the line from the eye through the fixate.

Figure 16.

Figure 16

LEFT: dθ/dβ[0]

RIGHT: dθ/dβ[1/2]

Top with relative eye-centered depth

Bottom with relative structural distance

Table 1.

Numerical comparisons of distractors

f=100 Values for D1 and D2 on Figure16
{d2,d1} dθdβ[0] (ρDρF) / ρF [0] ±(∣D − F∣) / f dθdβ[12] (ρDρF) / ρF [1/2]
D1 {0,50} 0.5 0.5 0.5 0.385 0.441
D2 {−20,50} 0.527 0.513 0.539 0.5 0.5

Section 4: The Two Dimensional Motion/Pursuit Formula and Structure from Motion

The invariant circles for the time zero motion/pursuit formula dθ/dβ[0] (on the left of Figure 7) show that when a distractor is displaced within the 2D horizontal plane along the circle with diameter from {0, 0} to {0, f + d1} on the fixate axis, the time zero value in Formula (9) will remain the same even though the distance from fixate to distractor changes. This means that the time zero cue is not always an accurate indicator even of relative distance when objects are displaced laterally from the fixation point. Binocular stereopsis, as evidenced by the similar curved shape of its horopter, is subject to the same limitation. However, with motion parallax a translating observer receives the time-varying motion and pursuit cues which can improve (or deteriorate) the accuracy of the relative depth estimate provided by the motion/pursuit formula. Therefore it is important know when the accuracy of the depth estimate provided by the motion/pursuit formula is improving, at its maximum, and when it is declining. Here we describe the spatial and temporal conditions when the motion/pursuit formula generates the most accurate estimate of relative depth for such distractors.

We know from Equation (15) that the motion/pursuit formula gives the exact relative depth at the time when fixate and distractor are in line, but when d2 ≠ 0, the distance to the fixate increases for the in-line eye position, ρF > f, lowering the relative depth ratio. When d2 = 0 (the distractor lies on the fixate axis as in Figure 1) the motion dθ/dt, the motion/pursuit ratio dθ/dα, and motion/pursuit formula dθ/dβ all peak at t = 0. When d2 ≠ 0, at translation speed s, the motion/pursuit ratio and the motion/pursuit formula dθ/dβ have critical points at the times given in Equation (22).

t1=d12+d22+2d1f+d12+d22d12+d22+4d1f+4f22d2st2=d12+d22+2d1fd12+d22d12+d22+4d1f+4f22d2s (22)

“Critical times” are times when the derivative of the motion pursuit formula (9) is zero: d(dθ/dβ)/dt = 0. These times are candidates for a maximum and we have the maximum for dθ/dβ at the time given in Equation (23).

tM[f,d1,d2]={t1=d12+d22+2d1f+d12+d22d12+d22+4d1f+4f22d2s,ifd1<00,ifd1=0t12=d12+d22+2d1fd12+d22d12+d22+4d1f+4f22d2s,ifd1>0} (23)

In one distractor dimension, the critical time is tM = 0 and the motion/pursuit formula is exact as in Equation (1) above. Graphs of dθ/dβ at t = 0 and at t = tM are shown in Figure (18) with f = 100. The graphs also show the structural distance from the fixate to a distractor in the horizontal plane. Distances for distractors closer to the eye are shown with a negative sign. (The eye is at {d2, f + d1} = {0, 0} in the base plane.) The vertical axis is relative distance (from F) as a fraction.

Figure 18.

Figure 18

Top Left: signed ∣DF∣ / Eye[tM]−F∣ Top Right: signed ∣DF∣ / ∣F

Bottom Left: signed ∣DF∣ / Eye[tM]−F∣ with dθ/dβ[tM]-dark Bottom Right: signed ∣DF∣ / ∣F∣ - medium with dθ/dβ[tM]-dark

The light surface graph on the left shows the time zero motion/pursuit formula together with the signed structural distance in dark shading. Notice that it gives the exact relative distance along the fixate axis where d2 = 0, but that the light and dark diverge when d2 ≠ 0. The regions where the two surfaces are within 10% and 20% of each other can be see in the left panel of Figure (15). In contrast, in the right panel, the 2D motion/pursuit formula, dθ/dβ at the peak time tM[f, d1, d2] (light) is shown together with the structural distance (dark). The comparison shows that the maximum value of the motion/pursuit formula can provide a reasonably accurate estimate of the structural distance over a broad range of distractor positions within the plane.

We summarize the approximation of the right graph as follows:

The Motion/Pursuit Law for the 2D Fixation Plane

If a distractor D lies in the plane determined by the eyes and fixate F, then the motion/pursuit formula for lateral motion at the time tM where the motion/pursuit formula is maximal approximates the relative distance of the distractor from the fixate,

dθdβ[tM]±DFF (24)

with the sign convention (−) for near distractors and (+) for far.

The 2D approximation of the Motion/Pursuit Law provided in Equation (24) is not a precise mathematical theorem, but only an approximation as illustrated on the right of Figure (17). That is, for the 2D case where d2 ≠ 0 and distractors are displaced from the naso-occiptital axis, the peak value of the dθ/dβ ratio provides a reasonable approximation to true structural depth for distractors in much of the central visual field, both nearer and farther than the fixation point. The deviation between the shaded surfaces shows that the approximation is less accurate for distractors farther way from the naso-occipital axis.

Figure 17.

Figure 17

Left: dθ/dβ[0] with signed ∣DF∣ / f Right: dθ/dβ[tM] with signed ∣DF∣ / f

The Motion/Pursuit Law and Structure from Motion

However, this 2D Motion/Pursuit Law is a potentially useful solution to the “structure from motion” problem. The primary issue with “structure from motion” is to determine the geocentric distance from fixate to distractor, ∣DF∣, a rigid geometric quantity. But, similar to the relative egocentric depth, only a “relative” distance is possible (without knowing translation speed). In our statement of the 2D M/PL above we took this relative measurement to be in terms of the rigid distance from the central point to the fixate, f = ∣F∣ in Equation (24) and Figure (18).

Another alternative for the relative distance is the rigid fixed geometric distance between F and D, relative to the eye at the time of peak observation, given in Equation (25),

±DFE[tM]F (25)

where E[t] = {s t, 0} is the location of the eye node at time t. We use the same near = (−) and far = (+) sign convention. If the eye does not translate laterally very far compared to distance of the fixate from the eye, this relative measurement is roughly the same, but we include graphical comparisons in Figure (18). The top of Figure (18) shows the two signed geometric quantities separately. The bottom shows the relative distance and peak M/P formula together for the two ways of measuring “relative.” The lower right graph is the same as the right graph of Figure (17). The surface graphs of Figure (18) qualitatively show how well the depth estimates generated by the Motion/Pursuit Law approximate the actual structural depth, but the specific quantitative differences might be difficult to see over the difference in height. Therefore, Figure (19) shows the same distractor regions (for f = 100cm) where the peak time motion/pursuit formula is within 10% and 20% of the corresponding geometric measure.

Figure 19.

Figure 19

Left: Percent difference between dθ/dβ[tM] and signed ∣DF∣ / ∣E[tM]−F

Right: Percent difference between dθ/dβ[tM] and signed ∣DF∣ / ∣F

Region of distractors within 10% (dark), 20% (light)

Some Numerical Comparisons

The time-dependent two dimensional motion/pursuit formula raises questions about the quantitative analysis of established experiments such as the “cut off” in depth sensitivity that Nagata (1991) reported. (Also see Cutting, Vishton (1995).) For example, Nagata used an apparatus with the distractor displaced laterally from the line-of-sight to the fixate. If this lateral displacement is 1 cm for a fixate at 100 cm, the motion/pursuit input to a depth of d1 = 1 cm is 40% greater 1/2 second after a constant speed observer crosses the center line; 1% lateral displacement causes a 40% change from the center-line observation in one half second. In a case like that outlined above, when the peak input is much greater than the time zero value (or the disparity equivalent), we believe that the motion/pursuit formulas will lead to a more accurate quantitative analysis of motion parallax experiments.

At the time of those experiments, the role of pursuit in motion parallax was not understood, and Nagata’s analysis does not account for pursuit. A more complete mathematical analysis of those experiments is now possible. In the range of Nagata’s “cut off”, pursuit is 100 times more than retinal motion and both signals are needed to determine depth. An important consideration is whether there is a physiological explanation for Nagata’s cut-off if the visual system can not combine motion and pursuit signals that are 2 orders of magnitude different.

To help the reader better understand this issue by doing actual numerical comparisons, interactive programs at [S2008.3], [S2008.4] compute the peak motion/pursuit formula and the time-varying formula for user-variable fixate and distractor points. Using these programs, sample values of the peak time and distance at speed = 80 cm/sec are shown in Table 2.

Table 2.

Comparison of Central Point Values and Peak Time Values

f=100 Comparisons
{d2,d1} tM (ρDρF) / ρF [
0]
dθ / dβ [0] ±(∣D − F∣)
/ f
dθ / dβ [tM] (ρDρF) / ρF [
tM]
s×tM
{1,1} −0.514 0.01 0.01 0.014 0.014 0.012 −41.13
{−1,1} 0.514 0.01 0.01 0.014 0.014 0.012 41.13
{1,−1} 0.521 −0.01 −0.01 −0.014 −0.014 −0.012 41.716
{−1,−1} −0.521 −0.01 −0.01 −0.014 −0.014 −0.012 −41.716
{1,2} −0.292 0.02 0.02 0.022 0.022 0.021 −23.346
{1,−2} 0.298 −0.02 −0.02 −0.022 −0.022 −0.021 23.874
{5,10} −0.279 0.101 0.102 0.112 0.112 0.106 −22.354
{5,−10} 0.313 −0.099 −0.097 −0.112 −0.111 −0.106 25.

The last column in Table 2, s×tM, is the distance along the horizontal axis where the peak mathematical depth estimate is generated. For example, in Nagata’s set-up, with a speed of 80 cm/sec and critical time −0.51 sec, the maximum estimate of distractor relative depth occurs 41 cm to the left. The 2D motion/pursuit law needs the observer to maintain fixation a fairly long distance from the center line to get the peak input for relative displacement for some distractors. Perhaps it is not surprising that the longest wait times are roughly in the direction of the invariant circle where dθ/dt[0] = 0 as we show next.

Time Delays to Peak Observation

Even a short period of observation generates an accurate estimate to the relative depth of distractors in a region close to the center line. For example, when the observer speed is 80 cm/sec, the distractor points that can are cued at the peak time within ±150 msec of the center line are shown on the left of Figure (21) (eye position within ±12 cm in ±.15 sec at 80 cm/sec). The locations of distractors generating peak values in the motion/pursuit formula at other time intervals are shown as gray regions in the other two panels. For distractors within the gray regions the motion/pursuit formula generates values that are close approximations of the actual relative displacement. Distractors in the white regions in the panels of Figure (20) are points where the motion/pursuit formula has not yet peaked, so the geometric cues have not reached a maximum value. In the regions where the gray portions of Figure (19) (showing accuracy over space) and Figure (20) (showing accuracy over temporal intervals) overlap, an accurate cue to relative distance is generated during the time interval. While Figure (19) shows distractor locations that give a cue within 10%, Figure (20) indicates that some of these distractor locations require a long observer translation time for the cue to actually reach its maximum value.

Figure 21.

Figure 21

Time zero and maximal time structural reconstruction

Figure 20.

Figure 20

Distractors with peak time −δt1, t2δ for f = 100 cm, speed 80 cm/sec

A comparison of the three panels shows that with increasing observation time more distractor points reach their maximum motion/pursuit value. For example, the right panel has a very small white region. To reach maximum, distractor points within this region require that the observer moves even farther from the center line (e.g., this movement would be beyond 80 cm [80 cm/s translation speed] for the frame on the right of Figure 20). Of course, the motion/pursuit formula does generate an estiumate for relative depth before the maximum value is achieved, but it will be a larger under approximation of the actual structural depth. For example, the overlapping white regions on the right frame of Figure (19) and the center frame of Figure (20) are distractors that have dθ/dβ[t] more than 10% different from the relative distance during the whole time interval 0 < t < 0.5 sec. For precise calculations of the approximation the reader is again directed to the interactive programs at [S2008.3] and [S2008.4] where they can explore specific numeric examples of their choosing. Overall, this analysis indicates that the motion/pursuit formula quickly generates accurate estimates of relative depth for distractor points in regions along the line-of-sight (i.e. grey region in left panel of Figure 20). However, for distractors in regions to either side of the fixate (i.e., white region in middle and right panel of Figure 20), the estimates for relative depth become more accurate with longer viewing time.

A Sample Structural Reconstruction

In Figure (21) we illustrate mathematically how the motion/pursuit ratio could reconstruct a shape. Our reconstruction takes motion/pursuit data from two circles with diameters. One is a circle of radius 10 cm with center 20 cm beyond a fixate at {0, 100} cm. The other is a circle of radius 10 cm with center 20 cm nearer to the observer, also including a diameter.

On the left part of Figure (21) the geometric shape is shown dashed and the thin solid curves represent position computations of points on the shape from the time zero motion/pursuit quantities. The point shown in the middle is the fixate. (The eye is far below on this scale.) The difference between the solid and dashed curves shows the inaccuracy of the single time zero motion/pursuit cue. (Computations with binocular disparity would give a similar mis-representation of the shape as the solid curve on the left part of the figure, see Stroyan (2010).)

On the right part of Figure (21), the dashed curves again show the geometrical shape, but here the solid curves, that are nearly the same as the dashed ones, are the computed positions of points using the maximal time motion/pursuit formula (with different times for different points.)

These figures were constructed by using the value of dθ/dβ for the relative depth of points on the figure (which could be perceived with a neural combination of dθ/dt and dα/dt.) On the left of Figure 21, the time zero values of dθ/dβ of a large sample of the points is used to re-construct its position resulting in the thin solid curve. On the right side of Figure (21), the maximal time value of dθ/dβ of each point is used to re-construct its position resulting in the thin solid curve that closely approximates the geometric shape. We make no empirical claim about this mathematical structure-from-motion reconstruction; it only shows the mathematical possibility of accurate reconstruction of a shape from the maximum time motion/pursuit ratio. (And it does not account for the fixation time needed to observe at peak time as shown in Figure (20).)

We are proposing to do experiments to determine whether people use the peak time tM observation. But we also know (since von Kries) that people do not maintain fixation for very long times. (Try to force yourself to do so while riding in a car. It’s fun. Your brain “wants” to shift fixation point.) So our proposal MUST be tempered by “how long it takes to get to tM.” Proper design of the experiment can only test in the gray regions of Figure 20.

Section 5: Fovea-centered Flat Retina Coordinates & Another “Depth”

Flat retina coordinates (described below) are used extensively in the vision literature. This section shows how to compute a kind of depth appropriate to those coordinates from the motion/pursuit ratio expressed in those coordinates. This is not the same as the angular dθ/dβ quantity calculated in different coordinates, but rather the naturally arising mathematical motion/pursuit ratio in flat retina coordinates. We show that the flat depth of Formula (28) has a different invariance structure that could possibly be used to decide if it is used empirically. We also give Formula (31) for flat depth in terms of the angular coordinates used above. The flat depth formula uses additional perceptual cues, not only the motion and pursuit cues, so we have not emphasized it.

One way to measure retinal images of objects is to project images onto a flat plane held one unit in front of the observer’s eye (mathematicians call these “projective” or “homogeneous” coordinates, but here they are eye-centered.). This is one standard approach in the study of retinal motion or motion parallax as in Longuet-Higgins & Prazdny (1980) and Fermüller & Aloimonos (1997). This approach begins with cartesian observer coordinates. One cartesian axis, the (zeta) ζ-axis (or “Z”-axis), points from the observer toward the fixate and a perpendicular ξ-axis (Greek “X”-axis) points “observer right” as shown in Figure (22) (left panel). The ξ-component of the fixate is zero, the location of the fovea. The ζ-component of the fixate is the distance from the observer to the fixate, so the fixate has components (ξ1, ζ1) = (0, ∣FE∣). The cartesian observer components of the distractor are (ξ2, ζ2) as shown on Figure (22) (left panel). By similar triangles the flat retina coordinate of the distractor is, x2 = ξ2/ζ2.

Figure 22.

Figure 22

Projective fovea-centered coordinates and the “flat retina”

Think of an x-axis perpendicular to the line of sight to the fixate and one unit from the eye. This is shown in Figure (22) (middle panel). This is might be illustrated by a horizontal line on glasses one unit in front of the observer’s eye, but this axis turns with the eye, so it is more descriptively called the “flat retina” as in Fermüller & Aloimonos (1997). The “flat retina” is just one mathematical way to coordinatize the observer’s retina. One can easily compute a location of the image of a point with given flat retina coordinates onto a circular retina behind the eye. The physical translation velocity of the observer is the time derivative of the position of the eye node, dEdt=E.=V. We express this in cartesian observer coordinates as V = (u, w) so V = w×Aim + u×Right as shown in Figure (22) (right panel). The time derivatives of the cartesian observer coordinates of the distractor are:

ξ.2=u+uζ1ζ2andζ.2=wuζ2ξ2 (26)

Differentiating the expression x2 = ξ2/ζ2 gives:

x.2=uζ2+uζ1+x2(wζ2+uζ1x2) (27)

The term u/ζ1 is the rate the eye (or observer’s coordinate axis) turns to “pursue” the fixate and keep its image on the fovea, so if we divide the retinal motion rate by the eye pursuit rate we obtain a formula for depth as follows.

Let rate of flat retinal motionrate of eye pursuit=MP, so MP=ζ1ux.2=(ζ2ζ1ζ1+x2wu)ζ1ζ2+x22. By algebra we obtain a new “flat relative depth” Equation (28) that only depends on the ζ-component of depth.

ζ2ζ1ζ1=MPx2(wu+x2)1MP+x22 (28)

The quantity (ζ2ζ1)/ζ1 does not change if the distractor is moved along lines parallel to the ξ-axis, so the graph of “flat relative depth” is a plane sloping up from the eye toward the fixate shown in Figure (23). Instead of the invariant circles used earlier in this analysis, we now have invariant lines perpendicular to the line from the eye to the fixate.

Figure 23.

Figure 23

Cartesian observer coordinates and the “flat retina” (or projective coordinates)

In the particular case when the fixate, distractor, and observer are in line so that x2 = 0, we obtain Formula (29) equivalent to Formula (15) above where M = dθ/dt and P = dα/dt.

ζ2ζ1ζ1=MP11MP (29)

At the point where the observer crosses the fixate axis perpendicular to observer translation (w = 0) we get Formula (30) which is similar to Formula (13) above.

ζ2ζ1ζ1=MPx221MP+x22 (30)

Equation (28) gives the exact flat relative depth of the distractor beyond the fixate (from the observer), (ζ2ζ1)/ζ1, but it is more complicated if it is used perceptually for the following reason. The quantities M, P, x2, and the slope of the direction of observer translation, w/u, all are needed as perceptual cues to use this “better” quantity in the brain. It might be possible to analyze the perceptual effects of the x2term as a distance from central vision and the perceptual effect of the slope w/u. Certainly we are aware of “looking off to the side” as we move and w/u is a measure of “how much off to the side.” The approach of the preceding sections of this article was to use “circle analysis” to relate the geometry of space to only the motion/pursuit ratio cue.

To compare the angular and flat retina formulas, note dαdt=sρFCos[α]=uζ1 and Tan[θ] = x2, so Sec[θ]2dθdt=dx2dt giving MP=Sec[θ]2dθdα and formula (31).

ζ2ζ1ζ1=dθdα1dθdαSin[θ]2+Cos[θ]Sin[θ]Tan[α]1dθdα (31)

Compare (31) to the angular Formula (13, α = 0 at t = 0), Formula (1, α = 0 and θ = 0), and Formula (15, α = β, θ = 0). The last term on the right of Formula (31) is the “correction” needed to make the motion/pursuit formula exactly equal to this flat relative depth. The angle α measures how much “off to the side” we look and θ measures how far from central vision the distractor lies. The Longuet-Higgins approach gives a precise mathematical result with the formula in Equation (28) that might be useful in computer vision, but we do not know of evidence that people can perceive the “flat relative depth.”

Discussion

In the 1925 edition of the Helmholtz (1910) treatise, von Kreis (pp371-2) wrote of the effects of motion parallax with “the eyes fastened for a brief space on some definite point” and contrasted them with the retinal motion effects of forward observer translation, concluding only that, “the probability is that both of them generally contribute to the result in some way, although it would be hard to say exactly how.” Recent work (Nawrot, 2003; Naji & Freeman, 2004; Nawrot & Joyce, 2006; Nadler, Angelaki, DeAngelis, 2008; Nawrot & Stroyan, 2009; Nadler, Nawrot, Angelaki, DeAngelis, 2009) has begun to show how the combined effect of retinal image motion and smooth pursuit eye movements contribute to visual depth perception, both psychophysically and neurophysiologically. Our previous work (Nawrot & Stroyan 2009) proposed that the value of the ratio dθ/dα is crucial for the perception of depth from motion parallax, but the mathematical justification offered there was restricted only to points in central vision. However, the perception of depth from motion parallax is not restricted to central vision so the present work (Section 1) extends the analysis of the motion/pursuit ratio across the horizontal plane. This broadens the known space where this dθ/dα ratio has a well defined relationship to the perception of depth from motion. However, a number of important new ideas were made immediately apparent by this analysis.

A fundamental limitation of dθ/dα ratio at one time is shared with binocular disparity. If the distractor D is displaced from the fixation axis as in Figure (2), binocular disparity does not give an accurate measure of the relative distance between F and D because binocular disparity is constant on circles through one distractor and the eye nodes. We showed (Section 2) that the time zero 2-dimensional retinal motion formulas are invariant on similar circles at a fixed time. This analysis shows that both binocular disparity and motion parallax, determined by the dθ/dα ratio at a fixed time, have a very similar limitation in the perception of depth with 2D displacement. However, the motion/pursuit ratio changes with time and, if the information of the varying cue is combined, the motion/pursuit ratio can mathematically determine accurate structure (Section 4).

One interesting aspect of this analysis is the way that the M/PL could apply to both an observer-centered “egocentric” depth and to an fixation-point centered “allocentric” concept of depth. While the two concepts of depth are similar along the line-of-sight. We analyzed both notions of depth in some detail (Section 3) because it is not clear empirically how these are used in depth perception outside central vision.

The time varying aspect of the motion/pursuit analysis is another novel concept arrising from this analysis (Section 4). A translating observer receives the cues of retinal motion and tracking pursuit continuously, not just at one time. The 2D M/PL says that the peak time geometric cue is an accurate indicator of the relative distance from distractor to fixate, but we do not know of experiments that test whether people use the peak cue. The value of the motion/pursuit formula is increasing, then decreasing after the peak time determined by each distractor. One question is whether the visual system can recognize the peak – the time of the most accurate distance estimate. For example, everyone is familiar with the loudness peak we hear when a truck or airplane approaches, goes by and continues on, or the intensity peak we see watching a distant lighthouse. While it is an open empirical question whether observers detect the “peak-time” for motion parallax, evidence suggests that it is possible. Although the human visual system is rather insensitive to actual acceleration of the retinal image (Watamaniuk & Duchon, 1992; Simpson, 1994), motion velocity sensitivity appears increase with acceleration (Schlack, Krekelberg, Albright; 2008) and decrease with deceleration. For motion parallax, this difference in velocity sensitivity between acceleration and deceleration phases may help maintain high velocity sensitivity until peak motion/pursuit ratio is realized, then cause a decrease sensitivity after the peak velocity value. In this way the system may be better (more sensitive) at updating depth estimates while the retinal image velocity is increasing at it approaches the peak Of course, it is possible that the time-varying aspects of depth perception from motion parallax may be just as important to perception as the single peak value. Indeed, the changes themselves may be cues to depth perception. However these are all empirical questions raised by understanding of the geometric inputs that the visual system may use for the perception of depth from motion parallax.

Regan, Beverley, Cynader (1978), Regan, Beverley (1978), (1979) did numerous experiments linking binocular disparity and motion (especially changing apparent size due to motion) and Richards (1985) gave a mathematical derivation of “structure from motion” based on a combination of disparity and retinal motion, partly to explain that work. (Also see Bradshaw, Parton, Eagle (1998), Bradshaw, Parton, Glennerster (2002).) The Motion/Pursuit Law gives a more accurate mathematical solution, but the novel feature is that it is based on a different mechanism - smooth eye pursuit.

We have presented the formulas in some detail as we believe that the dynamic geometry of the 2-dimensional motion/prsuit ratio will be useful both in understanding the neural processing underlying the perception of depth from motion parallax and for designing and analyzing experiments. In the future, this analysis will be important to compare the geometry of space, the mathematical motion/pursuit cues, and actual visual perception of depth. While the current work informs us of the theoretical utility of the M/PL, we still know little about how well the M/PL characterizes the visual perception of depth outside central vision. Moreover, the time varying nature of these geometric cues could only be considered with is mathematical approach, and now that it is well described it will be important to determine whether or not the acurate perception of depth and structure is linked to the peak values of the dθ/dα cue. The dynamic geometry of depth from motion parallax forms a starting place to begin such investigations.

Acknowledgement

We would like to thank anonymous reviewers for many helpful comments.

Contributor Information

Keith Stroyan, Mathematics Department University of Iowa.

Mark Nawrot, Department of Psychology North Dakota State University.

References

  1. Bradshaw MF, Parton AD, Eagle RA. Interaction of binocular disparity and motion parallax in determining perceived depth and perceived size. Perception. 1998;27:1317–1331. doi: 10.1068/p271317. [DOI] [PubMed] [Google Scholar]
  2. Bradshaw MF, Parton AD, Glennerster A. Task dependent use of binocular disparity and motion parallax information. Vision Research. 2002;40:3725–3734. doi: 10.1016/s0042-6989(00)00214-5. [DOI] [PubMed] [Google Scholar]
  3. Cutting JE, Vishton PM. Perceiving layout and knowing distances: The integration, relative potency, and contextual use of different information about depth. In: Epstein W, Rogers S, editors. Handbook of perception and cognition, Vol 5; Perception of space and motion. Academic Press; 1995. pp. 69–117. [Google Scholar]
  4. Domini F, Caudek C. 3-D structure perceived from dynamic information: A new theory. Trends in Cognitive Sciences. 2003;7:444–449. doi: 10.1016/j.tics.2003.08.007. [DOI] [PubMed] [Google Scholar]
  5. Fermüller C, Aloimonos Y. On the Geometry of Visual Correspondence. International Journal of Computer Vision. 1997;21(3):223–247. [Google Scholar]
  6. Gordon DA. Static and dynamic fields in human space perception. J. Opt Soc Am. 1965;55:1296–1303. doi: 10.1364/josa.55.001296. [DOI] [PubMed] [Google Scholar]
  7. Hanes DA, Keller J, McCollum G. Motion parallax contribution to perception of self-motion and depth. Biol. Cybern. 2008;98:273–293. doi: 10.1007/s00422-008-0224-2. [DOI] [PubMed] [Google Scholar]
  8. Helmholtz H. von. Treatise on Physiological Optics. Dover; New York: 1910. 1962. (English translation by J. P. C. Southall, three volumes bound as two, from the 3rd German edition of Handbuch der Physiologischen Optik. von Kries Note 4 appeared in the 1925 edition.)
  9. Hillis JM, Banks MS. Are corresponding points fixed? Vision Research. 2001;41:2457–2473. doi: 10.1016/s0042-6989(01)00137-7. [DOI] [PubMed] [Google Scholar]
  10. Koenderink JJ, van Doorn AJ. Invariant properties of the motion parallax field due to the movement of rigid bodies relative to the observer. Optica Acta. 1976;22:773–779. [Google Scholar]
  11. Koenderink JJ, van Doorn AJ. Facts on optic flow. Biol Cybern. 1987;56(4):247–254. doi: 10.1007/BF00365219. [DOI] [PubMed] [Google Scholar]
  12. Longuet-Higgins HC, Prazdny K. The Interpretation of a Moving Retinal Image. Proc. R. Soc. Lond. B. 1980;208:385–397. doi: 10.1098/rspb.1980.0057. [DOI] [PubMed] [Google Scholar]
  13. Miles FA, Busettini C. Ocular compensation for self-motion: visual mechanisms. Annual Review of the New York Academy of Science. 1992;656:220–232. doi: 10.1111/j.1749-6632.1992.tb25211.x. [DOI] [PubMed] [Google Scholar]
  14. Miles FA. The sensing of rotational and translational optic flow by the primate optokinetic system. In: Miles FA, Wallamn J, editors. Visual motion and its role in the stabilization of gaze. Elsevier; NewYork: 1993. pp. 393–403. [PubMed] [Google Scholar]
  15. Miles FA. The neural processing of 3-D visual information: evidence from eye movements. European Journal of Neuroscience. 1998;10:811–822. doi: 10.1046/j.1460-9568.1998.00112.x. [DOI] [PubMed] [Google Scholar]
  16. Nadler JW, Angelaki DE, DeAngelis GC. A neural representation of depth from motion parallax in macaque visual cortex. Nature. 2008;452:642–645. doi: 10.1038/nature06814. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Nadler JW, Nawrot M, Angelaki DE, DeAngelis GC. MT neurons combine visual motion with a smooth eye movement signal to code depth sign from motion parallax. Neuron. 2009;63:523–532. doi: 10.1016/j.neuron.2009.07.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Nagata S. How to reinforce the perception of depth in single two-dimensional pictures. In: Ellis SR, editor. Pictorial Communication in Virtual and Real Environments. Taylor & Francis; 1991. pp. 527–545. [Google Scholar]
  19. Naji JJ, Freeman TCA. Perceiving depth order during pursuit eye movements. Vision Research. 2004;44:3025–3034. doi: 10.1016/j.visres.2004.07.007. [DOI] [PubMed] [Google Scholar]
  20. Nakayama K, Loomis JM. Optical velocity patterns, velocity-sensitive neurons, and space perception: A hypothesis. Perception. 1974;3:63–80. doi: 10.1068/p030063. [DOI] [PubMed] [Google Scholar]
  21. Nawrot M. Eye movements provide the extra-retinal signal required for the perception of depth from motion parallax. Vision Res. 2003;43:1553–1562. doi: 10.1016/s0042-6989(03)00144-5. [DOI] [PubMed] [Google Scholar]
  22. Nawrot M, Joyce L. The pursuit theory of motion parallax. Vision Res. 2006;46:4709–4725. doi: 10.1016/j.visres.2006.07.006. (2006) [DOI] [PubMed] [Google Scholar]
  23. Nawrot M, Stroyan K. The motion/pursuit law for visual depth perception from motion parallax. Vision Res. 2009;49:1969–1978. doi: 10.1016/j.visres.2009.05.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Nawrot M, Stroyan K. Integration time for the mechanisms serving the perception of depth from motion parallax. Journal of Vision. 2010;10(7):50. doi: 10.1016/j.visres.2012.02.007. (abstract) [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Perronne JA, Stone LS. A Model of Self-motin Estimation Within Primate Extrastriate Visula Cortex. Vision Res. 1994;Vol 43(No. 21):2917–2938. doi: 10.1016/0042-6989(94)90060-4. [DOI] [PubMed] [Google Scholar]
  26. Regan D, Beverley KI, M. Cynader M. The visual perception of motion in depth. Sci. Amer. 1978;241:136–151. doi: 10.1038/scientificamerican0779-136. [DOI] [PubMed] [Google Scholar]
  27. Regan D, Beverley KI. Looming detectors in the human visual pathway. Vision Res. 1978;18:415–421. doi: 10.1016/0042-6989(78)90051-2. [DOI] [PubMed] [Google Scholar]
  28. Regan D, Beverley KI. Binocular and monocular stimuli for motion in depth: changing disparity and changing size feed the same motion-in-depth stage. Vision. Res. 1979;19:1331–1342. doi: 10.1016/0042-6989(79)90205-0. [DOI] [PubMed] [Google Scholar]
  29. Richards W. Structure from stereo and motion. J. Opt. Soc. Am. A. 1985;2:343–349. doi: 10.1364/josaa.2.000343. (1985) [DOI] [PubMed] [Google Scholar]
  30. Schlack AB, Krekelberg B, Albright TD. Speed perception during acceleration and deceleration. J. of Vision. 2008;8(8):9, 1–11. doi: 10.1167/8.8.9. [DOI] [PubMed] [Google Scholar]
  31. Simpson WA. Temporal summation of visual motion. Vision Res. 1994;34:2547–2559. doi: 10.1016/0042-6989(94)90241-0. [DOI] [PubMed] [Google Scholar]
  32. Stroyan K. Interactive Computation of Geometric Inputs to Vision. 2008 2008.3 Motion Pursuit LawI n 2D: Visual Depth Perception 3, http://demonstrations.wolfram.com/MotionPursuitLawIn2DVisualDepthPerception3/ 2008.4 Motion Pursuit Law On Invariant Circles: Visual Depth Perception 4, http://demonstrations.wolfram.com/MotionPursuitLawOnInvariantCirclesVisualDepthPerception4/ 2008.8 Veith Meuller Circles: Visual Depth Perception 8, http://demonstrations.wolfram.com/ViethMullerCirclesVisualDepthPerception8/ 2008.12 Motion Vs Depth: 2D Visual Depth Perception 12, http://demonstrations.wolfram.com/MotionParallaxVersusDepth2DVisualDepthPerception12/
  33. Watamaniuk SN, Duchon A. The human visual system averages speed information. Vision Res. 1992;32:931–941. doi: 10.1016/0042-6989(92)90036-i. [DOI] [PubMed] [Google Scholar]
  34. Stroyan K. Motion Parallax is Asymptotic to Binocular Disparity. 2010 http://arxiv.org/abs/1010.0575.

RESOURCES