Skip to main content
Springer logoLink to Springer
. 2013 Apr 2;104(3):286–314. doi: 10.1007/s11263-012-0607-7

A Variational Approach to Video Registration with Subspace Constraints

Ravi Garg 1, Anastasios Roussos 1, Lourdes Agapito 1,
PMCID: PMC3724559  PMID: 23908564

Abstract

This paper addresses the problem of non-rigid video registration, or the computation of optical flow from a reference frame to each of the subsequent images in a sequence, when the camera views deformable objects. We exploit the high correlation between 2D trajectories of different points on the same non-rigid surface by assuming that the displacement of any point throughout the sequence can be expressed in a compact way as a linear combination of a low-rank motion basis. This subspace constraint effectively acts as a trajectory regularization term leading to temporally consistent optical flow. We formulate it as a robust soft constraint within a variational framework by penalizing flow fields that lie outside the low-rank manifold. The resulting energy functional can be decoupled into the optimization of the brightness constancy and spatial regularization terms, leading to an efficient optimization scheme. Additionally, we propose a novel optimization scheme for the case of vector valued images, based on the dualization of the data term. This allows us to extend our approach to deal with colour images which results in significant improvements on the registration results. Finally, we provide a new benchmark dataset, based on motion capture data of a flag waving in the wind, with dense ground truth optical flow for evaluation of multi-frame optical flow algorithms for non-rigid surfaces. Our experiments show that our proposed approach outperforms state of the art optical flow and dense non-rigid registration algorithms.

Introduction

Optical flow in the presence of non-rigid deformations is a challenging task and an important problem that continues to attract significant attention from the computer vision community. It has wide ranging applications from medical imaging and video augmentation to non-rigid structure from motion. Given a template image of a non-rigid object and an input image of it after deforming, the task can be described as one of finding the displacement field (warp) that relates the input image back to the template. In this paper we consider long video sequences instead of a single pair of frames—each of the images in the sequence must be aligned back to the reference frame. Our work concerns the estimation of the vector field of displacements that maps pixels in the reference frame to each image in the sequence (see Fig. 1).

Fig. 1.

Fig. 1

Video registration is equivalent to the problem of estimating dense optical flow Inline graphic between a reference frame Inline graphic and each of the subsequent frames Inline graphic in a sequence. We propose a multi-frame optical flow algorithm that exploits temporal consistency by imposing subspace constraints on the 2D image trajectories

Two significant difficulties arise. First, the image displacements between the reference frame and subsequent ones are large since we deal with long sequences. Secondly, as a consequence of the non-rigidity of the motion, multiple warps can explain the same pair of images causing ambiguity. In this paper we show that a multi-frame approach allows us to exploit temporal information, resolving these ambiguities and improving the overall quality of the optical flow. We make use of the strong correlation between 2D trajectories of different points on the same non-rigid surface. These trajectories lie on a lower dimensional subspace and we assume that the trajectory vector storing 2D positions of a point across time can be expressed compactly as a linear combination of a low-rank motion basis. This leads to a significant reduction in the dimensionality of the problem while implicitly imposing some form of temporal smoothness. Figure 2 depicts the lower dimensional trajectory subspace.

Fig. 2.

Fig. 2

The strong correlation between 2D trajectories of different points on the same non-rigid surface can be exploited to impose temporal coherence by modelling long term temporal coherence imposing subspace constraints. These trajectories lie on a lower dimensional manifold which leads to a significant reduction in the dimensionality of the problem while implicitly imposing some form of temporal smoothness

Subspace constraints have been used before both in the context of sparse point tracking (Irani 2002; Brand 2001; Torresani et al. 2001; Torresani and Bregler 2002) and optical flow (Irani 2002; Garg et al. 2010) in the rigid and non-rigid domains, to allow correspondences to be obtained in low textured areas. While Irani’s original rigid (Irani 2002) formulation along with its non-rigid extensions (Torresani et al. 2001; Brand 2001; Torresani and Bregler 2002) relied on minimizing the linearized brightness constraint without smoothness priors, Garg et al. (2010) extended the subspace constraints to the continuous domain in the non-rigid case using a variational approach. Nir et al. (2008) propose a variational approach to optical flow estimation based on a spatio-temporal model. However, all of the above approaches impose the subspace constraint as a hard constraint. Hard constraints are vulnerable to noise in the data and can be avoided by substituting them with principled robust constraints.In this paper we extend the use of multi-frame temporal smoothness constraints within a variational framework by providing a more principled energy formulation with a robust soft constraint which leads to improved results. In practice, we penalize deviations of the optical flow trajectories from the low-rank subspace manifold, which acts as a temporal regularization term over long sequences. We then take advantage of recent developments (Chambolle 2004; Chambolle and Pock 2011) in variational methods and optimize the energy defining a variant of the duality-based efficient numerical optimization scheme. We are also able to prove that our soft constraint is preferable to a hard constraint imposed via reparameterization. To do this we provide a formulation of the hard constraint and its optimization and we perform thorough experimental comparisons where we show that the results obtained via the soft constraint always outperform those obtained after reparameterization.

The paper is organized as follows. In Sect. 2 we describe related approaches and discuss the contributions of our work. Section 3 defines the trajectory subspace constraints that we use in our formulation. In Sect. 4 we describe the energy and provide a discussion on the design of our effective trajectory regularizer. Section 5 addresses the optimization of our proposed energy. This is followed by a description of the estimation of the motion basis in Sect. 6. In Sect. 7 we propose the extension of our algorithm to vector-valued images and Sect. 8 discusses implementation details. Finally Sect. 9 describes the alternative formulation of the subspace constraint as a hard constraint while Sect. 10 describes our experimental evaluation.

Related Work and Contribution

Variational methods formulate the optical flow or image alignment problems as the optimization of an energy functional in a continuous domain. Stemming from Horn and Schunck’s original approach (Horn and Schunck 1981), the energy incorporates a data term that accounts for the brightness constancy assumption and a regularization term that allows to fill-in flow information in low textured areas. Variational methods have seen a huge surge in recent years due to the development of more sophisticated and robust data fidelity terms which are robust to changes in image brightness or occlusions (Brox and Malik 2011; Brox et al. 2004); the addition of efficient regularization terms such as Total Variation (TV) (Zach et al. 2007; Wedel et al. 2008) or temporal smoothing terms (Weickert and Schnörr 2001b); and new optimization strategies that allow computation of highly accurate (Wedel et al. 2009) and real time optical flow (Zach et al. 2007) even in the presence of large displacements (Alvarez et al. 2000; Brox and Malik 2011; Steinbruecker et al. 2009).

One important recent advance in variational optical flow methods has been the development of the duality based efficient optimization of the so-called TV-Inline graphic formulation (Zach et al. 2007; Chambolle and Pock 2011) (which owes its name to the Total Variation that is used for regularization and the robust Inline graphic-norm that is used in the data fidelity term). An example of this class is the Improved TV-Inline graphic (ITV-Inline graphic) method (Wedel et al. 2009), which yielded notable quantitative performance, by also carefully considering some practical aspects of the optical flow algorithm.Duplication of the optimization variable via a quadratic relaxation is used to decouple the linearized data and regularization terms, decomposing the optimization problem into two, each of which is a convex energy that can be solved in a globally optimal manner. The minimization algorithm then alternates between solving for each of the two variables assuming the other one fixed. One of the key advantages of this decoupling scheme is that since the data term is point-wise independent, its optimization can be highly parallelized using graphics hardware (Zach et al. 2007). Following its success in optical flow computation, this optimization scheme has since been successfully applied to motion and disparity estimation (Pock et al. 2010) and real time dense 3D reconstruction (Newcombe et al. 2011; Stuehmer et al. 2010). In this work we adopt this efficient duality based TV-Inline graphic optimization scheme (Zach et al. 2007) and extend it to the case of multi-frame optical flow for video registration, by modelling long term temporal coherence imposing subspace constraints.

Despite being such a powerful cue most optical flow algorithms do not take advantage of temporal coherence and only work on pairs of images. Few previous attempts to multi-frame optical flow estimation exist in the literature (Weickert and Schnörr 2001b, a; Papadakis et al. 2007; Nir et al. 2008; Werlberger et al. 2009; Volz et al. 2011). Even in those cases, temporal smoothness constraints are only exploited over a very small number of frames (typically Inline graphic or Inline graphic frames either side of the current image) and not for an entire sequence. This is mostly due to the difficulty of providing an explicit model for longer term trajectories. In recent work Volz et al. (2011) report improvements in optical flow computation by imposing first and second order trajectory smoothness over Inline graphic frames. We take this further and exploit temporal coherence throught the entire video. Moreover, while previous approaches incorporate explicit temporal smoothness regularization terms over a few frames, our subspace constraint acts as an implicit long term trajectory regularization term leading to temporally consistent optical flow.

Our approach is related to the recent work of Garg et al. (2010) in which dense multi-frame optical flow for non-rigid motion is computed under hard subspace constraints. Our approach departs in a number of ways. First, while Garg et al. (2010) imposes the subspace constraint via reparameterization of the optical flow, we use a soft constraint and optimize over two sets of closely coupled flows, one that lies on the low-rank manifold and one that does not. Secondly, our use of a robust penalizer for the data term allows us to have more resilience than Garg et al. (2010) against occlusions and appearance changes. Moreover, our use of a modified Total Variation regularizer instead of the non-robust Inline graphic-norm and quadratic regularizer used by Garg et al. (2010) allows to preserve object boundaries. Finally, by providing a generalization of the subspace constraint, we have extended the approach to deal with any orthonormal basis and not just the PCA basis. More recently Ricco and Tomasi (2012) also proposed the use of subspace constraints to model multi-frame optical flow with explicit reasoning for occlusions. However, their approach is restricted to hard subspace constraints with a known PCA basis which is computed from sparse feature tracking.

Non-rigid image registration, has recently seen substantial progress in its robust estimation in the case of severe deformations and large baselines both from keypoint-based and learning based approaches. Successful keypoint-based approaches to deformable image registration include the parametric1 approach of Pizarro and Bartoli (2010) who propose a warp estimation algorithm that can cope with wide baseline and self-occlusions using a piecewise smoothness prior on the deforming surface. A direct approach that uses all the pixels in the image is used as a refinement step. Discriminative approaches on the other hand, learn the mapping that predicts the deformation parameters given a distorted image but require a large number of training samples. In recent work, Tian and Narasimhan (2010) combine generative and discriminative approaches which results in lowering the total number of training samples.

Our contribution

In this paper we adopt a robust approach to non-rigid image alignment where instead of imposing the hard constraint that the optical flow must lie on the low-rank manifold (Garg et al. 2010), we penalize flow fields that lie outside it. Formulating the manifold constraint as a soft constraint using variational principles (Garg et al. 2011) leads to an energy with a quadratic term that allows us to adopt a decoupling scheme, related to the one described above (Zach et al. 2007; Chambolle and Pock 2011), for its efficient optimization. We propose a new anisotropic trajectory regularization term, parameterized in terms of the basis coefficients, instead of the full flow field. This results in an important dimensionality reduction in this term, which is usually the bottleneck of other quadratic relaxation duality based approaches (Zach et al. 2007; Chambolle and Pock 2011). Moreover, the optimization of our regularization step can be highly parallelized due to the independence of the orthonormal basis coefficients adding further advantages to previous approaches. Our approach can be seen as an extension of Zach et al. (2007) efficient TV-Inline graphic flow estimation algorithm to the case of multi-frame non-rigid optical flow, where the addition of subspace constraints acts as a temporal regularization term. In practice, our approach is equivalent to Zach et al. (2007) in the degenerate case where the identity matrix is chosen as the motion basis.

We take advantage of the high level of parallelism inherent to our approach by developing a GPU implementation using the Nvidia CUDA framework. This parallel implementation vastly outperforms the equivalent Matlab code.

Additionally, we provide an extension of our multi-frame approach to the case of vector-valued images which allows us to use the information from all colour channels in image sequences, and further improve results. Our novel optimization scheme is based on the dualization of the linearized data term. Unlike Râket et al.’s previous attempt to extend TV-Inline graphic flow to vector valued images (Rakêt et al. 2011), our new algorithm is not restricted to the use of the Inline graphic-norm penaliser and instead allows the use of more general convex robust penalizers in the data term.

Currently, there are no benchmark datasets for the evaluation of optical flow that include long sequences of non-rigid deformations. In particular, the most popular one (Baker et al. 2011) (Middlebury) does not incorporate any such sequences. To facilitate the quantitative evaluation of multi-frame non-rigid registration and optical flow and to promote progress in this area, we provide a new dataset based on motion capture data of a flag waving in the wind, with dense ground truth optical flow.

Our quantitative evaluation on this dataset using different motion bases shows that our proposed approach improves on state of the art algorithms including large displacement (Brox and Malik 2011) and duality based (Zach et al. 2007) optical flow algorithms and the parametric dense non-rigid registration approach of Pizarro and Bartoli (2010).

Multi-frame Image Registration

Consider a video sequence of non-rigid objects moving and deforming in 3D. In the classical optical flow problem, one seeks to estimate the vector field of image point displacements independently for each pair of consecutive frames. In this paper, we adopt the following multi-frame reformulation of the problem. Taking one frame as the reference template, typically the first frame, our goal is to estimate the 2D trajectories of every point visible in the reference frame over the entire sequence, using a multi-frame approach (Fig. 1 illustrates our approach). The use of temporal information in this way allows us to predict the location of points not visible in a particular frame making us robust to self-occlusions or external occlusions by other objects.

Low-Rank Trajectory Space

To solve the multi-frame optical flow problem, we make use of the fact that the 2D image trajectories of points on an object are highly correlated, even when the object is deforming. We model this property by assuming that the trajectories lie near a low-dimensional linear subspace. This assumption is analogous to the non-rigid low-rank shape model, first proposed by Bregler et al. (2000), which states that the time varying 3D shape of a non-rigid object can be expressed as a linear combination of a low-rank shape basis. This rank constraint has been successfully exploited for 3D reconstruction by Non-Rigid Structure from Motion (NRSfM) algorithms (Torresani et al. 2008) where the matrix of 2D tracks is factorized into the product of two low-rank matrices: a motion matrix that describes the camera pose and time varying coefficients and a shape matrix that encodes the basis shapes.

The low-rank shape basis model of Bregler et al. (2000), Torresani et al. (2008) exploits the spatial properties of non-rigid motion, introducing rank constraints on the 3D location of the set of points (shape) at any given frame. Interestingly, the dual formulation of this model states that the rank constraint can be instead applied to the trajectories of each individual point, modelling them as a linear combination of basis trajectories. Therefore, the motion and shape matrices can exchange their roles as basis and coefficients and we can either interpret the 2D tracks as the projection of a linear combination of 3D basis shapes or as the linear combination of a 2D motion basis. This concept of non-rigid trajectory basis was first introduced in 2D by Torresani and Bregler (2002) who applied it to non-rigid 2D tracking as an extension of the rigid subspace constraints proposed by Irani (2002). Later Akhter et al. (2008, 2011) extended the trajectory basis to 3D to model non-rigid 3D trajectories using the Discrete Cosine Transform (DCT) basis.

Dense Trajectory Subspace Constraints

This paper extends the use of 2D trajectory subspace constraints to the case of estimating dense multi-frame optic flow using a variational approach.

More precisely, we assume that the input image sequence has Inline graphic frames and the Inline graphic-th frame, Inline graphic has been chosen as the reference. We denote by Inline graphic the image domain and we define the function:

graphic file with name M20.gif 1

that represents the point trajectories in the following way. For every visible point Inline graphic in the reference image, Inline graphic is its discrete-time 2D trajectory over all frames of the sequence. The coordinates of each trajectory Inline graphic are expressed with respect to the position of the point Inline graphic at Inline graphic which means that Inline graphic and that the location of the same point in frame Inline graphic is Inline graphicWe use the term multi-frame optical flow to describe Inline graphic since it corresponds to a multi-frame extension of the conventional optical flow: the latter is given by Inline graphic in the degenerate case where the sequence contains only Inline graphic frames and the first one is considered as the reference (Inline graphic).

Mathematically, the robust linear subspace constraint on the 2D trajectories Inline graphic can be expressed in the following way. For all Inline graphic and Inline graphic:

graphic file with name M36.gif 2

which states that the trajectory Inline graphic of any point Inline graphic can be approximated as the linear combination of Inline graphic basis trajectories Inline graphic that are independent from the point location. We include a modeling error Inline graphic which will allow us to impose the subspace constraint as a penalty term.Normally the values of Inline graphic are relatively small, yet sufficient to improve the robustness of the multi-frame optical flow estimation.

Note that we consider that the chosen trajectory basis is orthonormal. We refer to the linear span of these basis trajectories as a trajectory subspace and denote it by Inline graphic The linear combination is controlled by coefficients Inline graphic that depend on Inline graphic therefore we can interpret the collection of all the coefficients for all the points Inline graphic as a vector-valued image Inline graphic Figure 3 illustrates the subspace constraint.

Fig. 3.

Fig. 3

The displacement of any point throughout the sequence can be expressed in a compact way as a linear combination of a low-rank trajectory basis. The basis vectors Inline graphic encode the temporal information while the coefficient maps Inline graphic describe the spatial distribution of the individual basis trajectories

In many cases, effective choices for the model order (or rank) Inline graphic correspond to values smaller than Inline graphic which means that the above representation is compact and achieves a significant dimensionality reduction on the point trajectories.

We now re-write equation (2) in matrix notation, which will be useful in the subsequent presentation. Let Inline graphic and Inline graphic Inline graphic be equivalent representations of the functions Inline graphic and Inline graphic that are derived by vectorizing the dependence on the discrete time Inline graphic and let Inline graphic be the trajectory basis matrix whose columns contain the basis elements Inline graphic after vectorizing them in the same way:

graphic file with name M60.gif 3

The subspace constraint (2) can now be written as follows:

graphic file with name M61.gif 4

Non-Rigid Video Registration from Multi-frame Optical Flow

Let Inline graphic be the sequence of grayscale image frames, which are given either directly from the input frames or from the input frames after some preprocessing, such as structure-texture decomposition (Wedel et al. 2009).

In our formulation, the estimation of the multi-frame optical flow is equivalent to the simultaneous registration of all the frames with the reference frame Inline graphic: Recall that for every frame Inline graphic the coordinates Inline graphic yield the current location of any image point Inline graphic of the reference. Therefore, the image:

graphic file with name M67.gif 5

is the registered version of the image Inline graphic back to the reference Inline graphic or in other words it is the warping of the image Inline graphic to the image Inline graphic As it will be described later, we expect that the brightness differences between every registered image and the reference image to be small and therefore we use an appropriate brightness constancy term in our proposed energy.

Variational Multi-frame Optical Flow Estimation

In this section we show how dense motion estimation can be combined with the trajectory subspace constraints described in Sect. 3. In order to estimate the 2D trajectories of all the points, or equivalently simultaneously register all the frames with the reference frame Inline graphic we propose the following energy:

graphic file with name M73.gif 6

where

graphic file with name M74.gif 7
graphic file with name M75.gif 8
graphic file with name M76.gif 9

We minimize this energy jointly with respect to the point trajectories Inline graphic and their components on the trajectory subspace that are determined by the linear model coefficients Inline graphic We also add the constraint that Inline graphic since this corresponds to the flow from the reference image frame to itself. The positive constants Inline graphic and Inline graphic weigh the balance between the terms of the energy. Also, Inline graphic in (9) denotes the Huber norm of a vector and Inline graphic is a space-varying weighting function (see Sect. 4 for more details).

Note that the functions Inline graphic and Inline graphic determine two sets of trajectories that are relatively close to each other but not identical since the subspace constraint is imposed as a soft constraint.This improves the robustness of our method against overfitting to the image data in cases where the brightness constancy assumption fails. For this reason, we consider that the final output of our method are the trajectories Inline graphic that lie on the trajectory subspace and are directly derived by the coefficients Inline graphic

Description of the Energy

In this section we provide more details about the properties of the proposed energy (6).

The first term (Inline graphic) is a data attachment term that uses the robust Inline graphic-norm and is a direct multi-frame extension of the brightness constancy term used by most optical flow methods, e.g. Zach et al. (2007). It is based on the assumption that the image brightness Inline graphic at every pixel Inline graphic of the reference frame is preserved at its new location, Inline graphic in every frame of the sequence. The use of an Inline graphic-norm improves the robustness of the method since it allows deviations from this assumption, which might occur in real-world scenarios because of noise, illumination changes or occlusions of some points in some frames.

The second term (Inline graphic) penalizes the difference between the two sets of trajectories Inline graphic and Inline graphic and acts as a coupling (linking) term between them. This term serves as a soft constraint that the trajectories Inline graphic should be relatively close to the subspace spanned by the basis Inline graphicConcerning the weight Inline graphic the larger its value the more restrictive the subspace constraint becomes. Since the subspace of Inline graphic is low-dimensional, this constraint operates also as a temporal regularization that is able to perform temporal filling-in in cases of occlusions or other distortions.

An equivalent interpretation is that this term is derived from the constraint that the error Inline graphic in (2) has a bounded Inline graphic norm, i.e. Inline graphic for some appropriate constant Inline graphic Then Inline graphic corresponds to the Lagrange multiplier for this constraint.

The third term (Inline graphic) corresponds to the spatial regularization of the trajectory coefficients. This term penalizes spatial oscillations of each coefficient caused by image noise or other distortions but not strong discontinuities that are desirable in the borders of each object. In addition, this term allows to fill in textural information into flat regions from their neighbourhoods. Following Werlberger et al. (2009), Newcombe et al. (2011), we use the Huber norm over the gradient of each subspace coefficient Inline graphic which is defined as:

graphic file with name M108.gif 10

where Inline graphic is a relatively small constant. The Huber norm is a convex differentiable function that combines quadratic regularization in the interval Inline graphic with Total Variation regularization outside the interval.For small gradient magnitudes the Huber norm offers smooth solutions, whereas for larger magnitudes the discontinuity preserving properties of Total Variation are maintained. Following Alvarez et al. (1999), Wedel et al. (2009), Newcombe et al. (2011), we also incorporate a space-varying weight Inline graphic that depends on the reference image as follows:

graphic file with name M112.gif 11

where Inline graphic is a constant and Inline graphic is the standard deviation of the 2D Gaussian Inline graphic that convolves the reference image Inline graphic This weight encourages discontinuities in flow to coincide with edges of the reference image by reducing the regularisation strength near those edges.Further discussion on our proposed regularization term Inline graphic is provided in Sect. 4.

Connections to Previous Work

Interestingly, our adopted strategy of estimating two sets of trajectories, Inline graphic and Inline graphic resembles the techniques of quadratic relaxation and duplication of the optimization variable that have been previously used in the context of optical flow and depth map estimation (Zach et al. 2007; Pock et al. 2010; Stuehmer et al. 2010; Newcombe et al. 2011). Similarly, we benefit from the fact that the optimization problem can be decomposed into two parts, each of which is a convex energy2 that can be solved efficiently and in a globally optimal manner. However, our formulation offers an additional advantage: the spatial regularization step, which is the bottleneck in these optimization schemes, is computationally much more efficient since it is applied to the coefficients Inline graphic that normally have smaller dimensionality than the flow Inline graphic

Note that there is a degenerate case in which our proposed approach becomes equivalent to independently estimating the flow from the reference Inline graphic to each frame Inline graphic by applying Inline graphic times the ITV-Inline graphic optical flow algorithm (Wedel et al. 2009). This degenerate case occurs when:

  • The motion basis is set to Inline graphic where Inline graphic is the Inline graphic identity matrix, in which case Inline graphic; and

  • Inline graphic and Inline graphic

When Inline graphic and Inline graphic the terms Inline graphic become equivalent to Inline graphic and therefore our regularization term Inline graphic is a summation of Total Variation terms. Furthermore, the choice Inline graphic converts the energy (6) into a summation of Inline graphic decoupled energy terms Inline graphic:

graphic file with name M140.gif 12

Each term Inline graphic corresponds to a specific frame Inline graphic and depends only on Inline graphic and the two coefficients Inline graphic and Inline graphic These coefficients stacked together as a vector-valued function can be seen as the auxiliary variable of Inline graphic so the energy term Inline graphic is equivalent to the convex relaxation of the TV-Inline graphic functional used in Wedel et al. (2009).

Effective Trajectory Regularization

In this section we provide further intuition into our choice of multi-frame optical flow regularization Inline graphic The presentation of this section follows a constructive approach—we build our proposed regularizer from the simplest choice of regularization term in successive steps, each of which adds more complexity but improves its effectiveness. We start by revisiting common practices in the literature and conclude by proposing our novel anisotropic trajectory regularization term in the final step. Our goal is to regularize the multi-frame optical flow Inline graphic that lies on the trajectory subspace. Note that Inline graphic can be interpreted as a vector valued function with Inline graphic channels encoding the horizontal and vertical components of the optical flow at each frame as defined in equation (3).

Step 1. A simple choice would be to use homogeneous regularization of Inline graphic which is a straightforward multiframe generalization of the model of Horn and Schunck (1981):

graphic file with name M154.gif 13

where Inline graphic denotes the Frobenius norm of a matrix and Inline graphic is the Jacobian of Inline graphic (each row contains the gradient of the corresponding channel of Inline graphic). However, this regularizer leads to oversmoothing on the motion boundaries since the quadratic term excessively penalizes large magnitudes of the gradients of Inline graphic which correspond to motion discontinuities.

Step 2. A way to avoid this is by applying a robust function Inline graphic that penalizes outliers of the gradient less severely than the quadratic penalizer:

graphic file with name M161.gif 14

This choice is used in Nir et al. (2008) and when only two frames are taken into account it is equivalent to the regularizers used in Schnörr (1994), Weickert (1998), Brox and Malik (2011) (isotropic flow-driven regularization in the terminology of Weickert and Schnörr (2001a)). Some examples of the robust function Inline graphic include the following:

  • Inline graphic in which case the regularizer is the vectorial total variation (Sapiro 1997) of the vector-valued function Inline graphic that encodes the multi-frame optical flow.

  • Inline graphic or the Huber norm (10), which is the choice adopted in our approach.

The robust function Inline graphic in (14) penalizes outliers of the norm Inline graphic less strongly, therefore allows discontinuities to occur at Inline graphic However, such outliers correspond only to the points Inline graphic where all the channels of Inline graphic display sharp discontinuities. If for example only few channels of Inline graphic have a high gradient at a point Inline graphic then Inline graphic is not treated as an outlier, since it is still low (because of the sum of squares over all channels that is involved in this norm). This regularizer is thus much less tolerant to motion boundaries that occur at individual channels.

Step 3.

The above problem can be addressed by applying the penalizer Inline graphic independently to the squared norm of the gradient of each channel of Inline graphic:

graphic file with name M176.gif 15

This is a direct multi-frame extension of the regularizer used in Deriche et al. (1995), Kumar et al. (1996), Aubert et al. (1999), Zach et al. (2007), Wedel et al. (2009) for which efficient numerical implementations exist (Zach et al. 2007; Wedel et al. 2009). In this way, each channel of Inline graphic can have different boundaries. However, this regularizer is on the other extreme of the regularizer of Step 2: where substantial correlation between the different channels exists, it is ineffective since it allows correlated trajectories to have different boundaries.

In addition, in contrast to the regularizers proposed in previous steps, it is not rotation invariant (Weickert and Schnörr 2001a).

Step 4.

To avoid the aforementioned problems, we adopt our subspace model for the 2D trajectories Inline graphic and rewrite the norm Inline graphic as a function of the coefficients:

graphic file with name M180.gif 16

where we have used the property of orthonormality of the basis Inline graphic Provided that the trajectory basis Inline graphic has been chosen appropriately, the coefficients Inline graphic are much less correlated than the channels of Inline graphic We conclude that it is more effective to apply the robust function Inline graphic independently to the basis coefficients (instead of the flow fields) and we derive the regularizer:

graphic file with name M186.gif 17

Furthermore, this regularizer leads to a much more efficient implementation for two main reasons. First, the resultant regularization is applied to the coefficients Inline graphic that typically have lower dimensionality than the flow Inline graphic Second, this regularization is decoupled for each coefficient and can thus be highly parallelized. Note that the regularizer (15) derived in Step 3 can be considered as a special case of the above regularizer when the Inline graphic identity matrix is chosen as the basis Inline graphic However, in our work, we use two choices for Inline graphic: DCT and PCA (derived from an initial flow). We now analyze each of these cases separately:

  • When the basis matrix Inline graphic has been estimated by applying PCA to some trajectory samples, the correlation between the coefficients can be considered negligible. Furthermore, in this case we regain the desirable property of rotation invariance, since the proposed regularizer (17) is consistent with the general design principle of Weickert and Schnörr (2001a) for rotationally invariant anisotropic regularizers. According to that principle3, given an appropriate decomposition of Inline graphic where Inline graphic are rotationally invariant expressions, one should use the regularizer Inline graphic which is rotationally invariant and anisotropic. In our case, the expressions Inline graphic correspond to the coefficients Inline graphic which are indeed rotation invariant: If we assume that a rotation of the input frames causes the same rotation to be applied to the trajectory samples, then the basis trajectories will be equally rotated. Therefore, the coefficients Inline graphic of a specific reference image point 4 will remain invariant and the corresponding trajectory Inline graphic will simply be rotated.

  • In the case of the DCT basis, the above properties do not hold. However, the regularizer (17) with a DCT basis is much more effective than the regularizer (15), since the DCT frequency components of a trajectory are typically less correlated than its actual coordinates. This is due to the fact that when the actual motions of the image points are compositions of different physical motions, these motions are expected to be much more localized in the frequency domain rather than in the time domain.

Step 5.

Finally, it is reasonable to assume that the boundaries of all the motion components tend to be a subset of the edges on the reference image. Following Alvarez et al. (1999), Wedel et al. (2009), Newcombe et al. (2011), in order to prevent any smoothing along the motion boundaries our final regularizer Inline graphic is weighted by a space-varying function Inline graphic that depends on the reference image as described in (11).

In our extensive experiments, we have empirically evaluated that the introduction of such a weighting improves the accuracy of the multiframe optical flow. This is in accordance with the experimental evidence reported in Wedel et al. (2009) for the classical optical flow.

Optimization of the Proposed Energy

In order to minimize the energy (6), we follow a coarse-to-fine technique with multiple warping iterations (Brox et al. 2004). In every warping iteration, we use an initialization Inline graphic that comes from the previous iteration. We approximate the data term (7) by linearizing the image Inline graphic around Inline graphic After this approximation, the energy (6) becomes convex.

Following Zach et al. (2007), we implement the optimization of the energy (6) using an alternating approach. We decouple the data and regularization terms to decompose the optimization problem into two, each of which can be more easily solved. In this section we show how to adapt the method of Zach et al. (2007) to our problem, to take advantage of its computational efficiency and apply it to multi-frame subspace-constrained optical flow. The key difference to Zach et al. (2007) is that we do not solve for pairwise optical flow but instead we optimize over all the frames of the sequence while imposing the trajectory subspace constraint as a soft constraint.

We apply an alternating optimization, updating either Inline graphic or Inline graphic in every iteration, as follows:

  • Repeat until convergence: Minimization Step 1: For Inline graphic fixed, update Inline graphic by minimizing Inline graphicw.r.t.Inline graphic Minimization Step 2: For Inline graphic fixed, update Inline graphic by minimizing Inline graphicw.r.t.Inline graphic

Convergence is declared if the relative update of Inline graphic and Inline graphic is negligible according to some appropriate distance threshold. Since at every step the value of the energy does not increase and this value is bounded below by its global minimum, the above alternation is guaranteed to converge to a global minimum point.

Minimization Step 1

Since in this step we keep Inline graphic fixed, we observe that only the last two terms, Inline graphic and Inline graphic of the energy (6) depend on Inline graphic Therefore we must minimize Inline graphic with respect to Inline graphic Using the matrix notation defined in (4), we can write the term Inline graphic as:

graphic file with name M224.gif 18

Let Inline graphic be an Inline graphic matrix whose columns form an orthonormal basis of the orthogonal complement of the trajectory subspace Inline graphic Then the block matrix Inline graphic is an orthonormal Inline graphic matrix, which means that its columns form a basis of Inline graphic Consequently, Inline graphic can be decomposed into two orthonormal vectors as

graphic file with name M232.gif 19

where

graphic file with name M233.gif 20

are the coefficients that define the projections of Inline graphic onto the trajectory subspace Inline graphic and its orthogonal complement. Equation (18) can now be further simplified:

graphic file with name M236.gif 21

due to the orthonormality of the columns of Inline graphic and Inline graphic (which makes the corresponding transforms isometric) and Pythagoras’ theorem. The component Inline graphic is constant with respect to Inline graphic; therefore it can be ignored from the current minimization. In other words, with Inline graphic being fixed and Inline graphic lying on the linear subspace Inline graphic penalizing the distance between Inline graphic and Inline graphic is equivalent to penalizing the distance between Inline graphic and the projection of Inline graphic onto Inline graphic

Thus, the minimization of Step 1 is equivalent to the minimization of:

graphic file with name M249.gif 22

where Inline graphic is the Inline graphic-th coordinate of Inline graphic We have finally obtained a new form of the energy that allows the trajectory model coefficients Inline graphic to be decoupled. The minimization of each term in the above sum can be done independently and corresponds to a small modification of the TV-Inline graphic Rudin-Osher-Fatemi (ROF) model (Rudin et al. 1992) applied to each coefficient Inline graphic: This modification consists of incorporating an edge weighting Inline graphic and replacing the Inline graphic norm Inline graphic with the Huber norm Inline graphic This modified ROF model has been recently studied in Newcombe et al. (2011) for the problem of depth estimation.The optimum Inline graphic is actually a regularized version of Inline graphic and the extent of this regularization increases as the weight Inline graphic decreases.

The benefits of the computational efficiency of the above procedure are twofold. First, these independent minimizations can be parallelized. Second, several efficient algorithms exist to implement such regularization models. Appendix A describes the actual algorithm we used for the optimization of this energy, which is related to the method proposed in Newcombe et al. (2011).

Minimization Step 2

Keeping Inline graphic fixed, we observe that only the first two terms of the energy (6), Inline graphic and Inline graphic depend on Inline graphic and therefore we have to minimize with respect to Inline graphic the following:

graphic file with name M268.gif 23

where Inline graphic This cost depends only on the value of Inline graphic on the specific point Inline graphic and the discrete time Inline graphic (and not on the derivatives of Inline graphic). Therefore the variational minimization of Step 2 is equivalent to the minimization of a bivariate function of Inline graphic for every spatiotemporal point Inline graphic independently.

We implement this point-wise minimization by applying the technique proposed in Zach et al. (2007) to every frame. More precisely, for every frame Inline graphic and point Inline graphic the image Inline graphic is linearized around Inline graphic where Inline graphic are the initializations of the trajectories Inline graphic The function to be minimized at every point will then have the simple form of a summation of a quadratic term with the absolute value of a linear term. The minimum can be easily found analytically using the thresholding scheme reported in Zach et al. (2007).

Derivation of the Trajectory Basis

Concerning the choice of 2D trajectory basis Inline graphic we consider orthonormal bases as it simplifies the analysis and calculations in our method (see Sect. 4). Of course this assumption is not restrictive, since for any basis an orthonormal one can be found that will span the same subspace. We now describe several effective choices of trajectory basis that we have used in our formulation.

Predefined bases for single-valued discrete-time signals with Inline graphic samples can be used to model separately each coordinate of the 2D trajectories. Assuming that the rank Inline graphic is an even number, this single-valued basis should have Inline graphic elements Inline graphic and the trajectory basis would be given by:

graphic file with name M287.gif 24

Provided that the object moves and deforms smoothly, effective choices for the basis Inline graphic are (i) the first Inline graphic low-frequency basis elements of the 1D Discrete Cosine Transform (DCT) or (ii) a sampling of the basis elements of the Uniform Cubic B-Splines of rank Inline graphic over the sequence’s time window, followed by orthonormalization of the yielded basis. The obvious advantage of using a predefined basis is that it does not need to be estimated in advance.

An alternative is to estimate the basis by applying Principal Component Analysis (PCA) to some sample trajectories. Provided that it is possible to estimate a set of sample trajectories that adequately represent the trajectories of the points over the whole object, the choice of the PCA basis is optimum for the linear model of a given rank Inline graphic in terms of representational power. In this work we consider two possibilities.

  • (i)

    The sample trajectories could come from an initial estimate of optical flow. We have found that the flow obtained using the DCT basis provides a very good initial flow on which we then apply PCA to obtain an optimized basis.

  • (ii)

    Alternatively, the sample trajectories could be a small subset of reliable point tracks, which we consider to be those where the texture of the image is strong in both spatial directions and can be selected using Shi and Tomasi’s criterion (Shi and Tomasi 1994). However, this option is not resilient to outliers.

In practice, in our experimental evaluation section we show that the multi-frame optical flow obtained with the optimized PCA basis proposed in (i) provides the best results. It has the added advantage that, since we initialize the flow from our algorithm using the DCT basis, which is predefined and needs not be estimated, the entire process is automated and less affected by outliers.

Generalization to Sequences of Vector-Valued Images

The algorithm we have described so far assumes that the images in the sequence are grayscale. In this section we develop a generalization of our approach to the case of sequences of vector-valued images. We propose an optimization scheme that is based on the dualization of the data term of the energy.

The use of vector-valued images can significantly improve the accuracy of the estimated optical flow for various reasons. First of all, the vector-valued images can incorporate all the color channels of an image. The color cue in a video offers important additional information and resolves ambiguities that are present in the grayscale images. Furthermore, this generalization offers the potential for incorporating other powerful image cues as additional channels. For instance, the spatial derivatives of the color channels can be added to impose the gradient constancy assumption (Uras et al. 1988; Brox et al. 2004; Papenberg et al. 2006; Brox and Malik 2011) or even more complex features such as SIFT (Liu et al. 2011) features or others derived using a Field-of-Experts formulation (Sun et al. 2008), which can improve the robustness against illumination changes of the scene. Note that in our experimental evaluation we have only incorporated the color channels. To cope with illumination changes we have used structure-texture decomposition as a preprocessing step, which is an alternative way to gain robustness (Wedel et al. 2009).

Proposed Dual Formulation

Let us assume that the video frames that are used in our data term are vector-valued images with Inline graphic channels:

graphic file with name M293.gif 25

To cope with this more general case, we only have to modify two elements of the formulation of our energy: (i) the data term Inline graphic of the proposed energy (6) and (ii) the edge-weighting function of the regularization term Inline graphic described in (11) that depends on the reference image.

The original definition of the function Inline graphic is based on the term Inline graphic used as a simple edge-strength predictor. For vector-valued images, we use a common and natural extension of this predictor (Blomgren and Chan 1998; Tschumperlé and Deriche 2005) by adding the contributions of the different image channels. We thus generalize the edge-weighting function as follows:

graphic file with name M298.gif 26

Concerning the data term Inline graphic we also make a further generalization by applying a generic robust function 5 Inline graphic to the image differences:

graphic file with name M301.gif 27

Our generalized data term becomes:

graphic file with name M302.gif 28

Since only the data term is affected by the extension to vector-valued images, the optimization of our proposed energy (6) only requires a modification of the minimization of Inline graphic with respect to Inline graphic (Step 2 in Sect. 5). Similarly to the case of grayscale images, this minimization is independent for every spatio-temporal point Inline graphic But the point-wise energy that must be minimized with respect to Inline graphic is now the following:

graphic file with name M307.gif

For every point Inline graphic in every frame Inline graphic each channel of Inline graphic is linearized around Inline graphic where Inline graphic are the initializations of the trajectories Inline graphic With this approximation, Inline graphic can be written as:

graphic file with name M315.gif 29

where Inline graphic and Inline graphic is the Inline graphic (spatial) Jacobian of the Inline graphic-th frame Inline graphic evaluated at Inline graphic

Assuming that the function Inline graphic is proper convex and lower semi-continuous, we dualise it by using its convex bi-conjugate (Rockafellar 1997; Chambolle and Pock 2011):

graphic file with name M323.gif 30

where, Inline graphic is the Legendre-Fenchel transform of Inline graphic and Inline graphic is the dual variable to Inline graphic We can now rewrite the energy Inline graphic (29) as:

graphic file with name M329.gif 31

Based on the above expression, we propose to minimise Inline graphic by solving the following saddle point problem:

graphic file with name M331.gif 32

where

graphic file with name M332.gif 33

Given a specific choice for the robust function Inline graphic one can derive efficient algorithms to solve the saddle point problem (32), using a similar framework as in Esser et al. (2010), Chambolle and Pock (2011), Pock and Chambolle (2011). In Appendix B we provide such algorithms for two special cases of Inline graphic of particular interest:

  • Inline graphic which leads to Inline graphic- norm of the image differences in Inline graphic (28). This is the choice that we use in our experiments on colour images.

  • Inline graphic which corresponds to the Huber norm (10).

Note that Rakêt et al. (2011) recently proposed an extension of the TV-Inline graphic algorithm for vector-valued images. Their method corresponds to the choice Inline graphic and uses a step of projection onto an elliptic ball. The formulation that we propose in this section can be seen as an alternative to the aforementioned work. The advantage of our approach is that it allows the use of more general robust functions Inline graphic

Implementation Details

In this section we provide details about the implementation of the numerical optimization schemes for our grayscale and vector-valued multi-frame subspace optical flow algorithms.

We used a similar numerical optimisation scheme and preprocessing of images6 to the one proposed in Wedel et al. (2009) to minimise the energy (6), i.e. we use the structure-texture decomposition to make our input robust to illumination artifacts due to shadows and shading reflections. We also used blended versions of the image gradients and a median filter to reject flow outliers. Concerning the choice of parameters, the default values proposed in Wedel et al. (2009) for the ITV-Inline graphic algorithm were found to give the best results for ITV-Inline graphic and our method on the benchmark sequence (5 warp iterations, 20 alternation iterations and the weights Inline graphic and Inline graphic were set to 30 and 2). The same settings were used in all our experiments on real sequences. Note that when we ran the colour version of our algorithm we downweighed the value of Inline graphic by a factor of Inline graphic to account for the three colour channels. Regarding the parameters of the space varying weight of the regularization term Inline graphic defined in (11), we used the following values: Inline graphic pixel, Inline graphic and Inline graphic

Since our algorithm can be efficiently parallelized on standard graphics hardware we have developed a GPU implementation using the CUDA framework. We run our algorithm on an NVIDIA GTX-580 GPU card hosted on a dual-core CPU. We obtain an average speedup of Inline graphic with respect to our CPU Matlab implementation which runs on a 4 quad-core server with 192Gb of memory.

Reparameterization of the Optical Flow: Hard Subspace Constraint

In the special case where the error Inline graphic in (2) is close to zero everywhere in the image, or equivalently when Inline graphic in (6), our soft constraint becomes a hard constraint and the optical flow Inline graphic can be reparameterized as:

graphic file with name M356.gif 34

where the coefficients of the motion basis Inline graphic are the unknown variables. In this case the energy for vector valued images with Inline graphic channels can be rewritten as:

graphic file with name M359.gif 35

where Inline graphic is the Inline graphic matrix Inline graphic i.e. two rows of the basis matrix Inline graphic which correspond to frame Inline graphic Appendix C describes a primal-dual optimization algorithm to minimize this energy obtained via reparameterization of the flow.

A valid question at this point would be: how does this hard subspace constraint compare with respect to our proposed soft constraint? In Sect. 3 we argued that a soft constraint would provide increased robustness. For this reason, in Sect. 10 we have conducted a thorough experimental comparison between the two approaches which in fact reveals that it is indeed beneficial to allow deviations from the subspace constraint. Our robust soft constraint consistently outperforms imposing a hard constraint via reparameterization of the optical flow.

Experimental Results

In this section we evaluate our method and compare its performance with state of the art optical flow (Brox and Malik 2011; Zach et al. 2007) and image registration (Pizarro and Bartoli 2010) algorithms. We show quantitative comparative results on our new benchmark ground truth optical flow dataset and qualitative results on real-world sequences7.

Furthermore, we analyse the sensitivity of our algorithm to some of its parameters, such as the choice of trajectory basis and regularization weight. Since our algorithm computes multi-frame optical flow and incorporates an implicit temporal regularization term, it would have been natural to compare its performance with a spatiotemporal optical flow formulation such as Weickert and Schnörr (2001b). However, due to the lack of publicly available implementations we chose to compare with LDOF (Large Displacement Optical Flow) (Brox and Malik 2011), one of the best performing optical flow algorithms, that can deal with large displacements by integrating rich feature descriptors into a variational optic flow approach to compute dense flow. We also compare against the duality-based ITV-Inline graphic (Improved TV-Inline graphic) algorithm (Wedel et al. 2009), which we use as a baseline since our method can be seen as its generalization to the case of multi-frame non-rigid optical flow via robust trajectory subspace constraints (see Sect. 4). In both cases, we register each frame in the sequence independently with the reference frame. We also compare with Pizarro and Bartoli’s state of the art keypoint-based non-rigid registration algorithm (Pizarro and Bartoli 2010).

Note that all these algorithms can only be used on grayscale images.

Construction of a Ground Truth Benchmark Dataset

For the purpose of quantitative evaluation of multi-frame non-rigid optical flow we have generated a new benchmark sequence with ground truth optical flow data. To the best of our knowledge, this is one of the first attempts to generate a long image sequence of a deformable object with dense ground truth 2D trajectories. We use sparse motion capture (MOCAP) data from White et al. (2007) to capture the real deformations of a waving flag in 3D. This sparse data is interpolated to create a continuous dense 3D surface using the motion capture markers as the control points for smooth Spline interpolation. Figure 4 shows four frames of the (a) sparse and (b) dense interpolated 3D flag surface. This dense 3D surface is then projected synthetically onto the image plane using an orthographic camera. We use texture mapping to associate some texture to the surface while rendering 60 frames of size 500Inline graphic500 pixels. We provide both grayscale and colour sequences.The advantage of this new sequence is that, since it is based on MOCAP data, it captures the complex natural deformations of a real non-rigid object while allowing us to have access to dense ground truth optical flow. We have also used three degraded versions of the original rendered sequences by adding (i) Gaussian noise, of standard deviation 0.2 relative to the range of image intensities, (ii) salt & pepper (S&P) noise of density 10% and (iii) synthetic occlusions generated by superimposing some black circles of radius 20 pixels moving in linear orbits. Figure 4 shows four frames of the original colour sequence, the ground truth optical flow and the equivalent frames of the grayscale sequence with: synthetic occlusions, Gaussian noise and salt & pepper noise.

Fig. 4.

Fig. 4

Rendering process for ground truth optical flow sequence of a non-rigid object for different images in each row. (a) Sparse surface Inline graphic representing MOCAP data (White et al. 2007), (b) Dense surfaces Inline graphic constructed using thin plate spline interpolation, (c) Ground truth optical flow Inline graphic visualized with the color coding that is shown at Inline graphic ((d)) Color sequence Inline graphic rendered from Inline graphic using texture mapping of a graffiti image, (e) Grayscale version Inline graphic of the same sequence with superimposed red disks indicate regions where intensities are replaced by black in the case of synthetic occlusions, (f) Grayscale sequence Inline graphic with synthetic gaussian noise, (g) Grayscale sequence Inline graphic with synthetic salt and pepper noise (Color figure online)

Quantitative Results on Benchmark Sequence

We tested our Multi-Frame Subspace Flow algorithm for grayscale (mfsf) and colour images (mfsf c) using the three different proposed motion basis: PCA, DCT and Cubic B-Spline (Figs. 5, 6). In Table 1, we provide a quantitative comparison of the performance of the different versions of our algorithm, against the state of the art methods listed above, using the four different versions of the rendered flag sequence as input. We report the root mean square (RMS) of the endpoint error, i.e. the amplitude of the difference between the ground truth and estimated flow Inline graphic These measures are computed over all the frames and for all the foreground pixels. Note that the results obtained with the Spline basis were omitted since they were almost equivalent to those obtained with the DCT basis, as Fig. 7a reveals.

Fig. 5.

Fig. 5

Inverse warps Inline graphic and error maps Inline graphic for frames (Inline graphic) of the original benchmark sequence. Each row shows results for different methods. (a–b) Multi-frame subspace flow on color images: (a) mfsf c Inline graphic, (b) mfsf c Inline graphic. (c–d) Multi-frame subspace flow on grayscale images: (c) mfsf Inline graphic, (d) mfsf Inline graphic. Against (e) itv-l1, Wedel et al. (2009). (f) LDOF Brox and Malik (2011), (g) Pizarro and Bartoli (2010)

Fig. 6.

Fig. 6

(a) RMS flow error vs increasing values of the rank of the different trajectory bases (PCA, DCT, UCBS). The graph shows that the PCA motion basis provides best results and that our algorithm does not overfit when the rank of the basis is overestimated. (b) RMS flow error vs increasing values of the weight of the subspace constraint Inline graphic (c) RMS flow error for increasing value of the rank of the PCA basis on the different variants of the benchmark sequence (occlusions, Gaussian noise, salt & pepper noise). All experiments are for our grayscale multi-frame subspace flow algorithm Inline graphic

Table 1.

RMS endpoint errors in pixels on the benchmark sequences of our proposed method for colour (mfsf c) and grayscale (mfsf) images using different motion basis (PCA, DCT and I Inline graphic)

Image type Method Version of input sequence:
Original Occlusions Gauss. noise S&P noise
Color Inline graphic 0.69 0.80 1.25 1.01
Inline graphic 0.80 1.00 1.52 1.17
Inline graphic 0.75 0.85 1.52 1.18
Inline graphic 0.89 1.12 1.84 1.38
Grayscale Inline graphic 1.13 1.43 1.83 1.60
ITV-Inline graphic (Wedel et al. 2009) 1.43 1.89 2.61 2.34
LDOF (Brox and Malik 2011) 1.71 2.01 4.35 5.05
Pizarro and Bartoli (2010) 1.24 1.27 1.94 1.79

We compare the different versions of our grayscale algorithm (mfsf) against state of the art optical flow (ITV-Inline graphic (Wedel et al. 2009), LDOF (Brox and Malik 2011)) and non-rigid registration (Pizarro and Bartoli 2010) methods

Numbers in bold highlight best performing color/grayscale algorithm

Fig. 7.

Fig. 7

Flow error maps Inline graphic on the benchmark sequence with synthetic occlusions for frames (Inline graphic). Each column shows results for different methods and errors are displayed as heatmaps. (a–b) Multi-frame subspace flow on color images: (a) mfsf c Inline graphic, (b) mfsf c Inline graphic. (c–d) Multi-frame subspace flow on grayscale images: (c) mfsf Inline graphic, (d) mfsf Inline graphic. (e) itv- L Inline graphic Wedel et al. (2009). (f) ldof Brox and Malik (2011) (g) Pizarro and Bartoli (2010). It is easy to see from the error maps for frames Inline graphic or Inline graphic that the colour versions of our algorithm (a) mfsf c Inline graphic and (b) mfsf Inline graphic improve substantially on their grayscale counterparts (c) mfsf Inline graphic and (d) mfsf Inline graphic

First we compare the performance of our original algorithm for grayscale images (mfsf) with ITV-Inline graphic  (Wedel et al. 2009), LDOF Brox and Malik (2011) and Pizarro and Bartoli (2010), since these algorithms can only be used on grayscale images. We report results for our algorithm using the full rank (Inline graphic) DCT basis (mfsf Inline graphic) and a full rank PCA basis (mfsf Inline graphic). Note that the PCA basis was estimated using as input the flow obtained after running our algorithm with the DCT basis (mfsf Inline graphic). We also ran our algorithm using the identity matrix as the basis (mfsf Inline graphic) to show the degradation of the results when subspace constraints are not applied to compute the multi-frame optical flow.

Table 1 shows that our proposed algorithms (mfsf Inline graphic) and (mfsf Inline graphic) rank top amongst the grayscale algorithms, outperforming all other methods and yielding the lowest RMS errors on all the sequences: original, occlusions, Gaussian noise and salt & pepper noise. The best results are obtained using the PCA basis.

Moreover, the top two rows of Table 1 show that using the novel extension of our algorithm to colour images (mfsf c) described in Sect. 7 improves significantly the results in all versions of the sequence. Once more, the results obtained using a full rank PCA basis (mfsf c Inline graphic) outperform those obtained with the DCT basis (mfsf c Inline graphic).

Regarding the choice of parameters, as we described in Sect. 8 the default values proposed in Wedel et al. (2009) for the ITV-Inline graphic algorithm were also found to give best results on our grayscale algorithm (mfsf). 8

However, we found that these parameters needed some tuning on the noisy and occluded versions of our benchmark sequence. A lower value of the data term weight Inline graphic was found to provide best results. Additionally, on the noisy sequences, the weight of the quadratic term was lowered to Inline graphic These modified values were used on mfsf Inline graphic, mfsf Inline graphicand mfsf Inline graphic.

Figure 5 shows a visual comparison of the results on the benchmark sequence reported in Table 1. We show a closeup of the reverse warped images Inline graphic of three frames in the sequence (Inline graphic) which should look identical to the template frame; and the error in the flow estimation Inline graphic for the same frames, expressed in pixels, encoded as a heatmap. Notice the significant improvements that our proposed algorithms for colour images (mfsf c Inline graphic, mfsf c Inline graphic) show with respect to their grayscale counterparts (mfsf Inline graphic, mfsf Inline graphic). Overall, all our approaches outperform state of the art methods: ITV-Inline graphic optical flow (Wedel et al. 2009); LDOF (Brox and Malik 2011) and Pizarro and Bartoli’s registration algorithm (Pizarro and Bartoli 2010).

Figure 7 shows results of the experiments on the benchmark sequence with synthetic occlusions. The error maps Inline graphic for images (Inline graphic) encoded as heatmaps are shown for all the variants of our grayscale (mfsf Inline graphic, mfsf Inline graphic) and colour (mfsf c Inline graphic, mfsf c Inline graphic) algorithms as well as ITV-Inline graphic (Wedel et al. 2009), LDOF (Brox and Malik 2011) and Pizarro and Bartoli (2010). We notice the same behaviour as in the experiments without occlusions—the error maps obtained with our algorithms show a superior performance with respect to state of the art approaches. Amongst our proposed approaches, one can observe significant improvements of the colour versions over their grayscale equivalents.

Figure 6a shows a graph of the RMS error over all the frames of the optical flow estimated using the 3 different bases for different values of the rank and of the weight Inline graphic associated with the soft constraint. For a reasonably large value of Inline graphic all the basis can be used with a significant reduction in the rank. The optimization also appears not to overfit when the dimensionality of the subspace is overly high. Figure 6c establishes the same fact in the case of noisy images and sequences with occlusions. Figure 6b explores the effect of varying the value of the weight Inline graphic on the accuracy of the optical flow. While low values of Inline graphic cause numerical instability (data and regularization terms become completely decoupled) high values of Inline graphic on the other hand, lead to slow convergence and errors since the point-wise search is not allowed to leave the manifold, simulating a hard constraint. Another interesting observation is that our proposed method with a PCA basis of rank Inline graphic=50, yields a better performance than with a full rank PCA basis Inline graphic=120. This reflects the fact that the temporal regularization due to the low dimensional subspace is often beneficial. Note that to analyze the sensitivity of our algorithm to its parameters in Fig. 6a–c we used ground truth tracks to compute the PCA basis to remove the bias from tracking.

Experimental Comparison of Soft Versus Hard Subspace Constraint

In this section we use the synthetic grayscale flag sequence to conduct an experimental comparison of the optical flow obtained using our proposed soft subspace constraint with that obtained imposing the hard constraint described in Sect. 9. The energy associated with the hard constraint (59) can be obtained by removing the quadratic term Inline graphic from our energy (6) and reparameterizing the optical flow in terms of the trajectory coefficients.

We use the primal-dual algorithm described in Appendix C to minimise the energy obtained via reparameterization (59) with 200 iterations per warp to ensure convergence. We observed that 200 iterations were enough for the convergence of the cost function to a reasonable tolerance (which we consider to be when the change in cost per iteration is Inline graphicth of the total change).

Our energy (6) based on the soft subspace constraint, is minimized using our optimization scheme described in Sect.5. To establish a fair comparison, we used 20 denoising iterations for the regularization step and 20 alternation iterations between the minimisation of Step 1 and Step 2 to ensure convergence.

Table 2 reports the RMS endpoint error, measured in pixels, of the flow obtained with the soft (S) and hard (H) constraints using 3 different basis:

  1. Low rank (Inline graphic) PCA basis obtained from sparse tracking using Pizarro and Bartoli (2010).

  2. Full rank PCA basis obtained from ground truth optical flow.

  3. Full rank DCT basis.

The comparative results in Table 2 show that the optical flow obtained with our soft constraint consistently outperforms the flow obtained after reparameterization (hard constraint) in all three experiments on all the different sequences (orginal, noisy and with occlusions). This is particularly the case in the presence of Gaussian noise when the endpoint errors differ most. However, this is to be expected since our soft constraint allows some deviations from the subspace manifold.

Table 2.

RMS endpoint error in pixels for the optical flow obtained with the hard (H) versus soft (S) constraints

Basis Rank Constraint Version of input sequence:
Original Occl. Gauss. noise S&P noise
Sparse PCA 75 Soft (S) 0.90 1.01 1.80 1.46
Hard (H) 0.98 1.05 2.22 1.60
GT PCA 120 Soft (S) 0.69 0.76 1.43 1.07
Hard (H) 0.70 0.77 1.65 1.08
DCT 120 Soft (S) 0.89 1.12 1.83 1.38
Hard (H) 1.09 1.28 2.00 1.42

We carry out 3 experiments using: (top) a low-rank sparse PCA basis (using tracks given by  Pizarro and Bartoli (2010)); (middle) a full rank ground truth PCA basis (computed using the ground truth optical flow); and (bottom) a full rank DCT basis. The algorithms were tested on all the different types of sequence (original, noisy and with occlusions)

In the first experiment we used a low rank PCA basis estimated from sparse tracking (obtained using Pizarro and Bartoli’s matching algorithm (Pizarro and Bartoli 2010)) to test the case of an inaccurate basis. This is the case when it is most clearly beneficial to allow deviations from the subspace manifold. This is naturally reflected on significantly higher endpoint errors on the flow computed with the hard constraint compared with that computed with our soft constraint.

It is also interesting to observe that even in the case when we used the full rank PCA basis computed from the ground truth flow the soft constraint performs marginally better than the hard constraint. In the sequence with Gaussian noise it provides a more clear benefit. Finally, the third experiment with a full rank DCT basis also shows that it is beneficial to use a soft constraint in all the different image sequences.

In conclusion, the optical flow obtained using the subspace constraint as a soft constraint consistently outperforms the flow obtained by reparameterization when both algorithms were ran until convergence. The benefits of the soft constraint are stronger when dealing with noisy images and in the case of an inaccurate motion basis which is to be expected.

Experiments on Real Sequences

In this section we provide details about the experiments we have carried out on four video sequences which display large displacements and strong deformations.

Actor sequence

This challenging sequence is a 39 frame long clip from a well known film, acquired at Inline graphic frames per second with images of size Inline graphic pixels. The top two rows of Fig. 8 show Inline graphic frames of this sequence in grayscale and colour. Note that frame Inline graphic was used as the reference frame 9. The bottom four rows in Fig. 8 show comparative results of the inverse warp images (using the computed optical flow to warp the current image back to the reference frame) estimated using the following different versions of our algorithm: mfsf Inline graphic, mfsf Inline graphic, mfsf c Inline graphic, mfsf c Inline graphic. The first two methods work on grayscale images and use the identity matrix and PCA basis as the motion basis respectively while the last two are their equivalent colour versions. Comparing the results of mfsf Inline graphicand mfsf Inline graphic(or mfsf c Inline graphicand mfsf c Inline graphic) allows us to show the advantages of using subspace constraints (PCA basis) versus not using a temporal model for the trajectories (Inline graphic basis). We use a full rank PCA basis obtained after applying principal components analysis to an initial flow estimated with our algorithm using the DCT basis.

Fig. 8.

Fig. 8

Results on the Actor sequence: (a–b) Some frames of the grayscale and colour input sequences. This is a challenging sequence with large displacements and strong deformations. Frame 31 Inline graphic is used as the reference frame. (c–d) Inverse warp images Inline graphic comparing two versions of our grayscale algorithm: c without subspace constraints (mfsf Inline graphic) and (d) with subspace constraints (mfsf c Inline graphic). (e–f) Inverse warp images Inline graphic comparing two versions of our colour algorithm: (e) without subspace constraints (mfsf c Inline graphic) and (f) with subspace constraints (mfsf c Inline graphic)

The advantages of using subspace constraints are clear. For instance, notice that for grayscale images mfsf Inline graphic failed completely to warp frame Inline graphic while mfsf Inline graphic provides an accurate inverse warp image for the same frame and consistently superior results throughout the sequence. It is also clear that making use of all three colour channels using the extension of our algorithm to vector valued images provides substantial improvements. Both mfsf c Inline graphic and mfsf c Inline graphic outperform their grayscale equivalents. In row (d) of Fig. 8 we have highlighed in red areas where the flow has clearly failed on the grayscale mfsf Inline graphicalgorithm but have been correctly warped in its colour version mfsf c Inline graphic .

Notice also that mfsf c Inline graphiccopes with the large displacements in frame Inline graphic much better than mfsf Inline graphic. However, just using colour without subspace constraints is not enough to estimate accurate flow. Comparing the bottom two rows of Fig. 8 reveals that using subspace constraints significantly improves results also in the case of colour. In conclusion, the best overall results are obtained with mfsf c Inline graphic, our colour algorithm with subspace constraints using the PCA basis.

Figures 9 and 10 support our claims by showing a grid superimposed on the images to reveal the optical flow in a sparse subset of points. The points on the mouth are highlighted in yellow since that is where most of the deformations occur. Once more, Fig. 9 reveals that the quality of the flow computed using trajectory regularization constraints on grayscale images (mfsf Inline graphic) is far better than that obtained without using subspace constraints (mfsf Inline graphic). Notice the complete failure of mfsf Inline graphicon frame Inline graphic Similar conclusions can be drawn from the results on the colour images shown in Fig.10. Notice the improvements particularly on the lips.

Fig. 9.

Fig. 9

Results on the grayscale Actor sequence: Top row (a) shows some frames of the original grayscale sequence. Middle (b) and bottom (c) rows compare the optical flow results obtained with two of our proposed grayscale algorithms: (c) with subspace constraints (mfsf Inline graphic) and (b) without subspace constraints (mfsf Inline graphic). The flow is visualized with a grid superimposed on the images to reveal the optical flow in a sparse subset of points. Points on the mouth are shown in yellow to highlight the results on the area with strongest deformations (Color figure online)

Fig. 10.

Fig. 10

Results on the colour Actor sequence: Top row (a) shows some frames of the original colour sequence. Middle (b) and bottom (c) rows compare the optical flow results obtained with two of our proposed colour algorithms: (c) with subspace constraints (mfsf c Inline graphic) and (b) without subspace constraints (mfsf c Inline graphic). The flow is visualized with a grid superimposed on the images to reveal the optical flow in a sparse subset of points. Points on the mouth are shown in yellow to highlight the results on the area with strongest deformations (Color figure online)

Actress sequence

This Inline graphic frame long clip from the same film shows a close-up of an actress opening the mouth widely. The resolution of the images was Inline graphic pixels. This sequence is similarly challenging to the previous one with very large displacements and deformations. In this case we only ran our best performing method on grayscale images mfsf Inline graphicwith subspace constraints using a PCA basis of rank Inline graphic Figure 11 shows the original sequence (top row); the inverse warp images estimated from the optical flow (middle row) and the original images augmented with some texture (bottom row) to simulate a tattoo.

Fig. 11.

Fig. 11

Results on the Actress sequence: (a) Some frames of the original grayscale sequence. (b) Inverse warp images obtained with our best performing grayscale method using subspace constraints (mfsf Inline graphic). (c) Original images augmented with some texture to simulate a tattoo

Paper bending-1 sequence

Figure 12 shows results on a sequence of textured paper bending smoothly  (Bartoli et al. 2008); a challenging sequence due to its length (Inline graphic frames) and the large camera rotation. We show results comparing our best performing grayscale algorithm (mfsf Inline graphic) against state of the art optical flow methods (ITV-Inline graphic (Wedel et al. 2009), LDOF (Brox and Malik 2011)). For completeness in our experimental evaluation, in this case we computed the motion basis by applying PCA to KLT tracks (Lucas and Kanade 1981) keeping the first 12 components. We ran the LDOF and ITV-Inline graphic algorithms using a multi-resolution scaling factor of 0.95, whereas for our algorithm the value 0.75 was sufficient (pointing to faster convergence). Comparing the warped images Inline graphic we observe that our method yields a significant improvement on the accuracy of the optical flow, especially after some frames (see e.g. the artifacts annotated by the red ellipses in the results of LDOF and ITV-Inline graphic). We show an alternative visualization of the same results with a grid superimposed on the images to reveal the optical flow in a sparse subset of points. This visualization helps to highlight the superiority of the optical flow estimated with our algorithm (mfsf Inline graphic) with respect to others.

Fig. 12.

Fig. 12

Results on Paper Bending-1 grayscale sequence: Comparativev results of the optical flow estimated with our best performing grayscale algorithm (mfsf Inline graphic) against state of the art optical flow methods (ITV-Inline graphic(Wedel et al. 2009), LDOF (Brox and Malik 2011)). We show two visualizations of the optical flow estimated with the three methods in alternate rows: (i) the inverse warped images and (ii) a grid superimposed on the images to reveal the optical flow in a sparse subset of points. Top row shows some frames of the original sequence

In Fig. 13 we show results on the colour version of this sequence, subsampled taking every fifth frame to give a Inline graphic frame long sequence. In this case, we augment the images with new texture using the optical flow results given by our colour multi-frame subspace algorithm using a PCA basis (mfsf c Inline graphic). In this case we use a full rank PCA basis obtained after applying principal components analysis to an initial flow estimated with our algorithm using the DCT basis (mfsf c Inline graphic).

Fig. 13.

Fig. 13

Results on Paper Bending-1 colour sequence: The top row shows some frames of the original colour sequence. The bottom row displays the same sequence augmented with some new texture. The optical flow obtained with our best performing colour algorithm mfsf c Inline graphicwas used to re-texture the original sequence

Paper bending-2 sequence

Figure 14 shows a Inline graphic frame long grayscale sequence introduced in Varol et al. (2009) of a paper being bent backwards which is widely used for 3D reconstruction in non-rigid structure from motion (NRSfM). Our method used a PCA basis of rank Inline graphic obtained from KLT tracks. The Inline graphicth frame is used as the reference. Once more, we compare results of our algorithm (mfsf Inline graphic) against the same state of the art approaches as in previous experiments. The inverse warped images and the colour coded optical flow in Fig. 14 reveal that despite having used a very low rank PCA motion basis, our results outperform LDOF and provide more accurate flow boundaries than ITV-Inline graphic

Fig. 14.

Fig. 14

Results on the Paper bending-2 sequence: Top row shows some images of this grayscale sequence. The 30th frame is used as the reference. Next rows show inverse warp images and colour coded optical flow comparing our best performing grayscale algorithm (mfsf Inline graphic) using a very low rank PCA decomposition (Inline graphic) against state of the art optical flow methods (ITV-Inline graphic (Wedel et al. 2009), LDOF (Brox and Malik 2011))

Conclusions

We have provided a new formulation for the computation of multi-frame optical flow exploiting the high correlation between 2D trajectories of points in a long sequence by assuming that these lie close to a low dimensional subspace. Our main contribution is to formulate the manifold constraint as a soft constraint which, using variational principles, leads to a robust energy that can be efficiently optimized. We propose a new anisotropic trajectory regularization term that acts on the coefficients of the trajectory basis. We take advantage of the high level of parallelism inherent to our approach by developing a GPU implementation using the Nvidia CUDA framework. We also provide an extension of our approach to the case of vector-valued images which allows us to exploit all three colour channels and gain substantial improvements in the accuracy of the estimated optical flow. We also provide a new benchmark dataset, with ground truth optical flow. Our experimental results on the benchmark dataset and on real video footage reveal that using subspace constraints significantly improves results. Our approach outperforms state of the art optical flow and non-rigid registration algorithms.

Acknowledgments

This work is supported by the European Research Council under ERC Starting Grant agreement 204871-HUMANIS. We thank T. Collins for his texture mapping code and D. Pizarro for providing results of their method (Pizarro and Bartoli 2010) and tracks for the synthetic sequence. We also thank A. Handa and L. Pizarro for fruitful discussions.

Open Access

This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Appendix A: Primal Dual Algorithm for Denoising

This appendix describes the optimization of the energy minimized in Step 1 of our algorithm as defined in (22):

graphic file with name M514.gif 36

which corresponds to a small modification of the TV-Inline graphic Rudin-Osher-Fatemi (ROF) model (Rudin et al. 1992), as described in Sect. 5.1. Note that as the trajectory model coefficients Inline graphic in (22) are decoupled for each Inline graphic in the following derivation we have dropped the subscript for simplicity.

The first step in the optimization is the dualisation of the weighted Huber functional Inline graphic of the above energy with respect to the gradient Inline graphic using its Legendre-Fenchel transform (Rockafellar 1997). After spatial discretization, the minimisation of (36) is equivalent to the following saddle point problem:

graphic file with name M520.gif 37

where Inline graphic is the set of image grid points, Inline graphic denotes the discrete gradient operator as defined in Chambolle and Pock (2011), Inline graphic are the dual variables for every Inline graphic and Inline graphic is the indicator function of the unit ball:

graphic file with name M526.gif 38

The problem (37) can be considered as a special case of the following general form of primal-dual problems that are studied in Chambolle and Pock (2011):

graphic file with name M527.gif 39

In the case of (37), the norm of the linear operator Inline graphic is bounded by Inline graphic Also, both Inline graphic and Inline graphic are uniformly convex with convexity parameters Inline graphic and Inline graphic respectively.

Therefore, we solve (37) by applying Algorithm 3 of  Chambolle and Pock (2011). The steps of the algorithm can be written as follows :

  • Initialize Inline graphic

  • Iterate for Inline graphic until a convergence criterion is satisfied:

graphic file with name M536.gif 40
graphic file with name M537.gif 41
graphic file with name M538.gif 42

where Inline graphic is the descrete divergence operator and the operator Inline graphic projects a vector Inline graphic onto the unit ball as:

graphic file with name M542.gif 43

We choose the following values for the steps Inline graphic that guarantees the convergence:

graphic file with name M544.gif 44

Appendix B: Primal Dual Algorithm for Robust Vector-Valued Image Matching

This appendix provides the details of the algorithm to optimise the saddle point problem (32) for vector-valued images using Euclidean norm and Huber penalisers.

Euclidean Norm Penaliser

This case corresponds to Inline graphic and is a straightforward extension of the absolute value of image differences that we used for Inline graphic in (7) for grayscale images. After dualisation, (32) can be written as:

graphic file with name M547.gif 45

This problem is also a special case of the general saddle point problem (39) with the linear operator Inline graphic Since the function Inline graphic is uniformly convex with convexity parameter Inline graphic we apply Algorithm 2 of  Chambolle and Pock (2011) and derive following optimisation algorithm:

  • Choose Inline graphic

  • Initialize Inline graphic from the previous alternation iteration.

  • Initialize Inline graphic

  • Iterate for Inline graphic until a convergence criterion is satisfied:
    graphic file with name M555.gif 46
    graphic file with name M556.gif 47
    graphic file with name M557.gif 48
    graphic file with name M558.gif 49

where Inline graphic can be any upper bound on the norm of Inline graphic Although the saddle point problem is minimised separately for each spatio-temporal point of the video and Inline graphic is spatially varying, for simplicity we choose a common upper bound on the linear operator for all the points. It can be shown that Inline graphic as defined below is a valid upper bound.

graphic file with name M563.gif 50

where Inline graphic are the horizontal and vertical coordinate axes of the image plane.

Huber Penaliser

When the robust function used in the data term of the energy for vector-valued images is the Huber norm: Inline graphic the saddle point problem (32) can be written as:

graphic file with name M566.gif 51

This problem is again of the form (39) with the linear operator Inline graphic The corresponding Inline graphic and Inline graphic functions are both uniformly convex with parameters Inline graphic and Inline graphic We thus solve (51) using Algorithm 3 of  Chambolle and Pock (2011) and derive the following optimisation algorithm:

  • Initialize Inline graphic from the previous alternation iteration.

  • Initialize Inline graphic

  • Iterate for Inline graphic until a convergence criterion is satisfied:
    graphic file with name M575.gif 52
    graphic file with name M576.gif 53
    graphic file with name M577.gif 54

We choose the following step-sizes which ensure the convergence of our algorithm:

graphic file with name M578.gif 55

where Inline graphic is, again, any upper bound on the operator norm of Inline graphic As in the case of Euclidean norm penalisation, we choose Inline graphic as defined in (50).

Appendix C: Optimization of the Hard Subspace Constraint

This appendix describes the optimization of the energy

graphic file with name M582.gif 56

which corresponds to the case when the subspace constraint is imposed as a hard constraint and the 2D flow Inline graphic can be reparameterized as Inline graphic First, each image channel of Inline graphic is linearised around Inline graphic using an initial estimate Inline graphic Under this approximation the data term can be written as:

graphic file with name M588.gif 57

where, for every spatio-temporal point Inline graphic

graphic file with name M590.gif 58

is the Inline graphic Jacobian matrix and Inline graphic Inline graphic is a Inline graphic dimensional vector.

Thus, the following minimization problem must be solved:

graphic file with name M595.gif 59

where Inline graphic is the linearised color constancy. After dualisation of the data and regularisation terms and spatial discretization, the minimisation (59) is equivalent to the following saddle point problem:

graphic file with name M597.gif 60

where Inline graphic and Inline graphic are the dual variables for every Inline graphic and Inline graphic respectively.

The energy (60) can be considered as a special case of the general form of primal-dual problem (39) where the linear operator Inline graphic is the Inline graphic dimensional matrix:

graphic file with name M604.gif 61

where Inline graphic are the image grid points and Inline graphic

Thus, we solve (52) by applying Algorithm 1 of  Chambolle and Pock (2011). In this case, the steps of this algorithm can be written as follows :

  • Initialize Inline graphic

  • Initialize Inline graphic

  • Iterate for Inline graphic until a convergence criterion is satisfied:

graphic file with name M610.gif 62
graphic file with name M611.gif 63
graphic file with name M612.gif 64
graphic file with name M613.gif 65

We use the following step-sizes, which guarantee the convergence of this algorithm too:

graphic file with name M614.gif 66

Inline graphic is the following upper bound on the operator norm of Inline graphic (61):

graphic file with name M617.gif 67

where Inline graphic is given by (50).

Footnotes

1

The parametric warp functions used in this work include Thin Plate Spline (TPS) and Free-Form Deformations (FFD) based on 2D Cubic B-Splines.

2

After the linearization of the brightness constancy term

3

In Weickert and Schnörr (2001a) this design principle is expressed for the classical optical flow case where the input is a single pair of frames, but here we present its straight-forward extension to the case of multiple frames.

4

By specific reference point we mean that we associate the new location (after rotation) of a point on the reference image with its original location.

5

Note that, for the sake of clarity in our presentation, the generic robust function Inline graphic defined here differs from the robust function Inline graphic that we used in Sect. 4: Inline graphic is applied directly to the vectorial differences whereas Inline graphic is applied to their squared norms. The two definitions are linked by: Inline graphic

6

Note that we have normalized the image intensity values to lie between Inline graphic and Inline graphic

7

Videos of the results as well as our benchmark dataset can be found on the following URL: http://www.eecs.qmul.ac.uk/~lourdes/subspace_flow

8

Note that, as we discussed in Sect. 4, mfsf Inline graphic and ITV-Inline graphic  Wedel et al. (2009) are equivalent algorithms and should therefore provide the same results. The difference in the numerical results is due to two factors: (i) in mfsf Inline graphic Inline graphic and Inline graphic (ii) the ITV-Inline graphic algorithm was run with its default parameters and mfsf Inline graphic with the tuned parameters described above.

9

We choose the reference frame to be one in which the points we are interested in tracking are all visible and also to reduce the maximum displacements.

Contributor Information

Ravi Garg, Email: rgarg@eecs.qmul.ac.uk.

Anastasios Roussos, Email: troussos@eecs.qmul.ac.uk.

Lourdes Agapito, Email: lourdes@dcs.qmul.ac.uk.

References

  1. Akhter, I., Sheikh, Y., Khan, S., & Kanade, T. (2008). Nonrigid structure from motion in trajectory space. In Neural Information Processing Systems, pp. 41–48.
  2. Akhter I, Sheikh Y, Khan S, Kanade T. Trajectory space: A dual representation for nonrigid structure from motion. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2011;33(7):1442–1456. doi: 10.1109/TPAMI.2010.201. [DOI] [PubMed] [Google Scholar]
  3. Alvarez L., Esclarín J., Lefébure M., Sánchez J. (1999). A PDE model for computing the optical flow. In Proceedings of the XVI Congreso de Ecuaciones Diferenciales y Aplicaciones (pp. 1349–1356). Las Palmas de Gran Canaria, Spain, Sept. 1999.
  4. Alvarez, L., Weickert, J., & Sánchez, J. (Aug. 2000). Reliable estimation of dense optical flow fields with large displacements. International Journal of Computer Vision, 39(1), 41–56.
  5. Aubert G, Deriche R, Kornprobst P. Computing optical flow via variational techniques. SIAM Journal on Applied Mathematics. 1999;60(1):156–182. doi: 10.1137/S0036139998340170. [DOI] [Google Scholar]
  6. Baker S, Scharstein D, Lewis J, Roth S, Black M, Szeliski R. A database and evaluation methodology for optical flow. International Journal of Computer Vision. 2011;92:1–31. doi: 10.1007/s11263-010-0390-2. [DOI] [Google Scholar]
  7. Bartoli, A., Gay-Bellile, V., Castellani, U., Peyras, J., Olsen, S. I., & Sayd, P. (2008). Coarse-to-fine low-rank structure-from-motion. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8.
  8. Blomgren, P., & Chan, T. (1998). Color TV: Total variation methods for restoration of vector-valued images. IEEE Transactioons on Image Processing, 7(3):304–309, Special issue on partial differential equations and geometry-driven diffusion in image processing and analysis. [DOI] [PubMed]
  9. Brand, M. (2001). Morphable models from video. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 456–463.
  10. Bregler, C., Hertzmann, A., & Biermann, H. (2000). Recovering non-rigid 3D shape from image streams. In IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. 690–696.
  11. Brox, T., Bruhn, A., Papenberg, N., & Weickert, J. (2004). High accuracy optical flow estimation based on a theory for warping. In T. Pajdla & J. Matas (Eds.), European Conference on Computer Vision—ECCV 2004 (pp. 25–36). Part IV, volume 3024 of LNCS Berlin: Springer.
  12. Brox, T., & Malik, J. (2011). Large displacement optical flow: Descriptor matching in variational motion estimation. IEEE Transactions on Pattern Aanalysis and Machine Intelligencea, 33(3), 500–513. [DOI] [PubMed]
  13. Chambolle A. An algorithm for total variation minimization and applications. Journal of Mathematical Imaging and Vision. 2004;20:89–97. doi: 10.1023/B:JMIV.0000011320.81911.38. [DOI] [Google Scholar]
  14. Chambolle, A., & Pock, T. (2011). A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision, 40(1), 120–145.
  15. Deriche, R., Kornprobst, P., & Aubert, G. (1995). Optical-flow estimation while preserving its discontinuities: A variational approach. In Proceedings of the Second Asian Conference on Computer Vision (Vol. 2, pp. 290–295). Singapore, Dec. 1995.
  16. Esser E, Zhang X, Chan TF. A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science. SIAM Journal on Imaging Sciences. 2010;3(4):1015–1046. doi: 10.1137/09076934X. [DOI] [Google Scholar]
  17. Garg, R., Pizarro, L., Rueckert, D., & Agapito, L. (2010). Dense multi-frame optic flow for non-rigid objects using subspace constraints. In Asian Conference on Computer Vision, pp. 460–473.
  18. Garg, R., Roussos, A., & Agapito, L. (2011). Robust trajectory-space TV-L1 optical flow for non-rigid sequences. In 8th International Conference on Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR), pp. 300–314.
  19. Horn B, Schunck B. Determining optical flow. Artificial Intelligence. 1981;17:185–203. doi: 10.1016/0004-3702(81)90024-2. [DOI] [Google Scholar]
  20. Irani, M. (2002). Multi-frame correspondence estimation using subspace constraints. International Journal of Computer Vision, 48(3), 173–194.
  21. Kumar, A., Tannenbaum, A. R., & Balas, G. J. (Apr. 1996). Optic flow: A curve evolution approach. IEEE Transactions on Image Processing, 5(4), 598–610. [DOI] [PubMed]
  22. Liu, C., Yuen, J., & Torralba, A. (2011). SIFT flow: Dense correspondence across scenes and its applications. In IEEE Transational on Pattern Analysis and Machins Intelligence, 33(5), 978–994. [DOI] [PubMed]
  23. Lucas, B., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In International Joint Conference on Artificial Intelligence.
  24. Newcombe, R., Lovegrove, S., & Davison, A. (2011). DTAM: Dense tracking and mapping in real-time. In International Conference on Computer Vision, pp. 2320–2327.
  25. Nir, T., Bruckstein, A. M., & Kimmel, R. (February 2008). Over-parameterized variational optical flow. International Journal of Computer Vision, 76, 205–216.
  26. Papadakis, N., Corpetti, T., & Mémin, E. (2007). Dynamically consistent optical flow estimation. In ICCV (pp. 1–7). Rio de Janeiro, Brazil, October 2007.
  27. Papenberg, N., Bruhn, A., Brox, T., Didas, S., & Weickert, J. (Apr. 2006). Highly accurate optic flow computation with theoretically justified warping. International Journal of Computer Vision, 67(2), 141–158.
  28. Pizarro, D., & Bartoli, A. (2010). Feature-based deformable surface detection with self-occlusion reasoning. In International symposium on 3D data processing, visualization and transmission, 3DPVT’10.
  29. Pock, T., & Chambolle, A. (2011). Diagonal preconditioning for first order primal-dual algorithms in convex optimization. In International Conference on Computer Vision, pp. 1762–1769.
  30. Pock, T., Cremers, D., Bischof, H., & Chambolle, A. (2010). Global solutions of variational models with convex regularization. SIAM Journal on Imaging Sciences, 3(4), 1122–1145.
  31. Rakêt, L. L., Roholm, L., Nielsen, M., & Lauze, F. (2011). TV-L1 optical flow for vector valued images. In 8th International Conference on Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR), pp. 329–343.
  32. Ricco, S., & Tomasi, C. (2012). Dense lagrangian motion estimation with occlusions. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1800–1807.
  33. Rockafellar, R. T. (1997). Convex analysis. Princeton Landmarks in Mathematics. Princeton University Press, Princeton, NJ, 1997. Reprint of the 1970 original, Princeton Paperbacks.
  34. Rudin L, Osher S, Fatemi E. Nonlinear total variation based noise removal algorithms. Physica D. 1992;60:259–268. doi: 10.1016/0167-2789(92)90242-F. [DOI] [Google Scholar]
  35. Sapiro, G. (1997). Color snakes. Computer Vision and Image Understanding, 68(2), 247–253.
  36. Schnörr, C. (1994). Segmentation of visual motion by minimizing convex non-quadratic functionals. In Proceedings of the twelfth international conference on pattern recognition (Vol. A, pp. 661–663). Jerusalem, Israel, Oct. 1994. IEEE Computer Society Press.
  37. Shi, J., & Tomasi, C. (1994). Good features to track. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 593–600.
  38. Steinbruecker, F., Pock, T., & Cremers, D. (2009). Large displacement optical flow computation without warping. In International Conference on Computer Vision, pp. 1609–1614.
  39. Stuehmer, J., Gumhold, S., & Cremers, D. (2010). Real-time dense geometry from a handheld camera. In Pattern recognition (Proc. DAGM) (pp. 11–20), September 2010.
  40. Sun, D., Roth, S., Lewis, J. P., & Black, M. (2008). Learning optical flow. In European Conference on Computer Vision, pp. 83–97.
  41. Tian, Y., & Narasimhan, S. (2010). A globally optimal data-driven approach for image distortion estimation. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1277–1284.
  42. Torresani, L., & Bregler, C. (2002). Space-time tracking. In European Conference on Computer Vision, pp. 801–812.
  43. Torresani, L., Hertzmann, A., & Bregler, C. (2008). Non-rigid structure-from-motion: Estimating shape and motion with hierarchical priors. PAMI, 30(5), 878–892. [DOI] [PubMed]
  44. Torresani, L., Yang, D., Alexander, E., & Bregler, C. (2001). Tracking and modeling non-rigid objects with rank constraints. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 493–500.
  45. Tschumperlé D, Deriche R. Vector-valued image regularization with PDE’s: A common framework for different applications. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2005;27(4):506–517. doi: 10.1109/TPAMI.2005.87. [DOI] [PubMed] [Google Scholar]
  46. Uras S, Girosi F, Verri A, Torre V. A computational approach to motion perception. Biological Cybernetics. 1988;60:79–87. doi: 10.1007/BF00202895. [DOI] [Google Scholar]
  47. Varol, A., Salzmann, M., Tola, E., & Fua, P. (2009). Template-free monocular reconstruction of deformable surfaces. In International Conference on Computer Vision, pp. 1811–1818.
  48. Volz, S., Bruhn, A., Valgaerts, L., & Zimmer, H. (2011). Modeling temporal coherence for optical flow. In International Conference on Computer Vision, pp. 1116–1123.
  49. Wedel, A., Cremers, D., Pock, T., & Bischof, H. (2009). Structure- and motion-adaptive regularization for high accuracy optic flow. In International Conference on Computer Vision, pp. 1663–1668.
  50. Wedel, A., Pock, T., Braun, J., Franke, U., & Cremers, D. (2008). Duality TV-L1 flow with fundamental matrix prior. In Image and Vision Computing New Zealand, pp. 1–6.
  51. Wedel, A., Pock, T., Zach, C., Bischof, H., & Cremers, D. (2009). An improved algorithm for TV-L1 optical flow. In Statistical and geometrical approaches to visual motion analysis, LNCS (pp. 23–45). Springer, Berlin.
  52. Weickert, J. (1998). On discontinuity-preserving optic flow. In S. Orphanoudakis, P. Trahanias, J. Crowley, & N. Katevas (Eds.), Proceedings of the computer vision and mobile robotics workshop (pp. 115–122). Santorini, Greece, Sept 1998.
  53. Weickert, J., & Schnörr, C. (Dec. 2001). A theoretical framework for convex regularizers in PDE-based computation of image motion. International Journal of Computer Vision, 45(3), 245–264.
  54. Weickert, J., & Schnörr, C. (May 2001). Variational optic flow computation with a spatio-temporal smoothness constraint. Journal of Mathematical Imaging and Vision, 14(3), 245–255.
  55. Werlberger, M., Trobin, W., Pock, T., Wedel, A., Cremers, D., & Bischof, H. (2009). Anisotropic Huber-L1 optical flow. In British Machine Vision Conference, Vol. 34, pp. 1–11.
  56. White, R., Crane, K., Forsyth, D. (2007). Capturing and animating occluded cloth. In ACM Transactions on Graphics.
  57. Zach, C., Pock, T., & Bischof, H. (2007). A duality based approach for realtime TV-L1 optical flow. In Pattern recognition (Proc. DAGM), pp. 214–223.

Articles from International Journal of Computer Vision are provided here courtesy of Springer

RESOURCES