A Variational Approach to Video Registration with Subspace Constraints

Ravi Garg; Anastasios Roussos; Lourdes Agapito

doi:10.1007/s11263-012-0607-7

. 2013 Apr 2;104(3):286–314. doi: 10.1007/s11263-012-0607-7

A Variational Approach to Video Registration with Subspace Constraints

Ravi Garg ¹, Anastasios Roussos ¹, Lourdes Agapito ^1,^✉

PMCID: PMC3724559 PMID: 23908564

Abstract

This paper addresses the problem of non-rigid video registration, or the computation of optical flow from a reference frame to each of the subsequent images in a sequence, when the camera views deformable objects. We exploit the high correlation between 2D trajectories of different points on the same non-rigid surface by assuming that the displacement of any point throughout the sequence can be expressed in a compact way as a linear combination of a low-rank motion basis. This subspace constraint effectively acts as a trajectory regularization term leading to temporally consistent optical flow. We formulate it as a robust soft constraint within a variational framework by penalizing flow fields that lie outside the low-rank manifold. The resulting energy functional can be decoupled into the optimization of the brightness constancy and spatial regularization terms, leading to an efficient optimization scheme. Additionally, we propose a novel optimization scheme for the case of vector valued images, based on the dualization of the data term. This allows us to extend our approach to deal with colour images which results in significant improvements on the registration results. Finally, we provide a new benchmark dataset, based on motion capture data of a flag waving in the wind, with dense ground truth optical flow for evaluation of multi-frame optical flow algorithms for non-rigid surfaces. Our experiments show that our proposed approach outperforms state of the art optical flow and dense non-rigid registration algorithms.

Introduction

Optical flow in the presence of non-rigid deformations is a challenging task and an important problem that continues to attract significant attention from the computer vision community. It has wide ranging applications from medical imaging and video augmentation to non-rigid structure from motion. Given a template image of a non-rigid object and an input image of it after deforming, the task can be described as one of finding the displacement field (warp) that relates the input image back to the template. In this paper we consider long video sequences instead of a single pair of frames—each of the images in the sequence must be aligned back to the reference frame. Our work concerns the estimation of the vector field of displacements that maps pixels in the reference frame to each image in the sequence (see Fig. 1).

Fig. 1 — Video registration is equivalent to the problem of estimating dense optical flow between a reference frame and each of the subsequent frames in a sequence. We propose a multi-frame optical flow algorithm that exploits temporal consistency by imposing subspace constraints on the 2D image trajectories

Inline graphic — Video registration is equivalent to the problem of estimating dense optical flow between a reference frame and each of the subsequent frames in a sequence. We propose a multi-frame optical flow algorithm that exploits temporal consistency by imposing subspace constraints on the 2D image trajectories

Two significant difficulties arise. First, the image displacements between the reference frame and subsequent ones are large since we deal with long sequences. Secondly, as a consequence of the non-rigidity of the motion, multiple warps can explain the same pair of images causing ambiguity. In this paper we show that a multi-frame approach allows us to exploit temporal information, resolving these ambiguities and improving the overall quality of the optical flow. We make use of the strong correlation between 2D trajectories of different points on the same non-rigid surface. These trajectories lie on a lower dimensional subspace and we assume that the trajectory vector storing 2D positions of a point across time can be expressed compactly as a linear combination of a low-rank motion basis. This leads to a significant reduction in the dimensionality of the problem while implicitly imposing some form of temporal smoothness. Figure 2 depicts the lower dimensional trajectory subspace.

Fig. 2 — The strong correlation between 2D trajectories of different points on the same non-rigid surface can be exploited to impose temporal coherence by modelling long term temporal coherence imposing subspace constraints. These trajectories lie on a lower dimensional manifold which leads to a significant reduction in the dimensionality of the problem while implicitly imposing some form of temporal smoothness

Subspace constraints have been used before both in the context of sparse point tracking (Irani 2002; Brand 2001; Torresani et al. 2001; Torresani and Bregler 2002) and optical flow (Irani 2002; Garg et al. 2010) in the rigid and non-rigid domains, to allow correspondences to be obtained in low textured areas. While Irani’s original rigid (Irani 2002) formulation along with its non-rigid extensions (Torresani et al. 2001; Brand 2001; Torresani and Bregler 2002) relied on minimizing the linearized brightness constraint without smoothness priors, Garg et al. (2010) extended the subspace constraints to the continuous domain in the non-rigid case using a variational approach. Nir et al. (2008) propose a variational approach to optical flow estimation based on a spatio-temporal model. However, all of the above approaches impose the subspace constraint as a hard constraint. Hard constraints are vulnerable to noise in the data and can be avoided by substituting them with principled robust constraints.In this paper we extend the use of multi-frame temporal smoothness constraints within a variational framework by providing a more principled energy formulation with a robust soft constraint which leads to improved results. In practice, we penalize deviations of the optical flow trajectories from the low-rank subspace manifold, which acts as a temporal regularization term over long sequences. We then take advantage of recent developments (Chambolle 2004; Chambolle and Pock 2011) in variational methods and optimize the energy defining a variant of the duality-based efficient numerical optimization scheme. We are also able to prove that our soft constraint is preferable to a hard constraint imposed via reparameterization. To do this we provide a formulation of the hard constraint and its optimization and we perform thorough experimental comparisons where we show that the results obtained via the soft constraint always outperform those obtained after reparameterization.

The paper is organized as follows. In Sect. 2 we describe related approaches and discuss the contributions of our work. Section 3 defines the trajectory subspace constraints that we use in our formulation. In Sect. 4 we describe the energy and provide a discussion on the design of our effective trajectory regularizer. Section 5 addresses the optimization of our proposed energy. This is followed by a description of the estimation of the motion basis in Sect. 6. In Sect. 7 we propose the extension of our algorithm to vector-valued images and Sect. 8 discusses implementation details. Finally Sect. 9 describes the alternative formulation of the subspace constraint as a hard constraint while Sect. 10 describes our experimental evaluation.

Related Work and Contribution

Variational methods formulate the optical flow or image alignment problems as the optimization of an energy functional in a continuous domain. Stemming from Horn and Schunck’s original approach (Horn and Schunck 1981), the energy incorporates a data term that accounts for the brightness constancy assumption and a regularization term that allows to fill-in flow information in low textured areas. Variational methods have seen a huge surge in recent years due to the development of more sophisticated and robust data fidelity terms which are robust to changes in image brightness or occlusions (Brox and Malik 2011; Brox et al. 2004); the addition of efficient regularization terms such as Total Variation (TV) (Zach et al. 2007; Wedel et al. 2008) or temporal smoothing terms (Weickert and Schnörr 2001b); and new optimization strategies that allow computation of highly accurate (Wedel et al. 2009) and real time optical flow (Zach et al. 2007) even in the presence of large displacements (Alvarez et al. 2000; Brox and Malik 2011; Steinbruecker et al. 2009).

One important recent advance in variational optical flow methods has been the development of the duality based efficient optimization of the so-called TV- Inline graphic formulation (Zach et al. 2007; Chambolle and Pock 2011) (which owes its name to the Total Variation that is used for regularization and the robust -norm that is used in the data fidelity term). An example of this class is the Improved TV- (ITV-) method (Wedel et al. 2009), which yielded notable quantitative performance, by also carefully considering some practical aspects of the optical flow algorithm.Duplication of the optimization variable via a quadratic relaxation is used to decouple the linearized data and regularization terms, decomposing the optimization problem into two, each of which is a convex energy that can be solved in a globally optimal manner. The minimization algorithm then alternates between solving for each of the two variables assuming the other one fixed. One of the key advantages of this decoupling scheme is that since the data term is point-wise independent, its optimization can be highly parallelized using graphics hardware (Zach et al. 2007). Following its success in optical flow computation, this optimization scheme has since been successfully applied to motion and disparity estimation (Pock et al. 2010) and real time dense 3D reconstruction (Newcombe et al. 2011; Stuehmer et al. 2010). In this work we adopt this efficient duality based TV- Inline graphic optimization scheme (Zach et al. 2007) and extend it to the case of multi-frame optical flow for video registration, by modelling long term temporal coherence imposing subspace constraints.

Despite being such a powerful cue most optical flow algorithms do not take advantage of temporal coherence and only work on pairs of images. Few previous attempts to multi-frame optical flow estimation exist in the literature (Weickert and Schnörr 2001b, a; Papadakis et al. 2007; Nir et al. 2008; Werlberger et al. 2009; Volz et al. 2011). Even in those cases, temporal smoothness constraints are only exploited over a very small number of frames (typically Inline graphic or frames either side of the current image) and not for an entire sequence. This is mostly due to the difficulty of providing an explicit model for longer term trajectories. In recent work Volz et al. (2011) report improvements in optical flow computation by imposing first and second order trajectory smoothness over Inline graphic frames. We take this further and exploit temporal coherence throught the entire video. Moreover, while previous approaches incorporate explicit temporal smoothness regularization terms over a few frames, our subspace constraint acts as an implicit long term trajectory regularization term leading to temporally consistent optical flow.

Our approach is related to the recent work of Garg et al. (2010) in which dense multi-frame optical flow for non-rigid motion is computed under hard subspace constraints. Our approach departs in a number of ways. First, while Garg et al. (2010) imposes the subspace constraint via reparameterization of the optical flow, we use a soft constraint and optimize over two sets of closely coupled flows, one that lies on the low-rank manifold and one that does not. Secondly, our use of a robust penalizer for the data term allows us to have more resilience than Garg et al. (2010) against occlusions and appearance changes. Moreover, our use of a modified Total Variation regularizer instead of the non-robust Inline graphic -norm and quadratic regularizer used by Garg et al. (2010) allows to preserve object boundaries. Finally, by providing a generalization of the subspace constraint, we have extended the approach to deal with any orthonormal basis and not just the PCA basis. More recently Ricco and Tomasi (2012) also proposed the use of subspace constraints to model multi-frame optical flow with explicit reasoning for occlusions. However, their approach is restricted to hard subspace constraints with a known PCA basis which is computed from sparse feature tracking.

Non-rigid image registration, has recently seen substantial progress in its robust estimation in the case of severe deformations and large baselines both from keypoint-based and learning based approaches. Successful keypoint-based approaches to deformable image registration include the parametric1 approach of Pizarro and Bartoli (2010) who propose a warp estimation algorithm that can cope with wide baseline and self-occlusions using a piecewise smoothness prior on the deforming surface. A direct approach that uses all the pixels in the image is used as a refinement step. Discriminative approaches on the other hand, learn the mapping that predicts the deformation parameters given a distorted image but require a large number of training samples. In recent work, Tian and Narasimhan (2010) combine generative and discriminative approaches which results in lowering the total number of training samples.

Our contribution

In this paper we adopt a robust approach to non-rigid image alignment where instead of imposing the hard constraint that the optical flow must lie on the low-rank manifold (Garg et al. 2010), we penalize flow fields that lie outside it. Formulating the manifold constraint as a soft constraint using variational principles (Garg et al. 2011) leads to an energy with a quadratic term that allows us to adopt a decoupling scheme, related to the one described above (Zach et al. 2007; Chambolle and Pock 2011), for its efficient optimization. We propose a new anisotropic trajectory regularization term, parameterized in terms of the basis coefficients, instead of the full flow field. This results in an important dimensionality reduction in this term, which is usually the bottleneck of other quadratic relaxation duality based approaches (Zach et al. 2007; Chambolle and Pock 2011). Moreover, the optimization of our regularization step can be highly parallelized due to the independence of the orthonormal basis coefficients adding further advantages to previous approaches. Our approach can be seen as an extension of Zach et al. (2007) efficient TV- Inline graphic flow estimation algorithm to the case of multi-frame non-rigid optical flow, where the addition of subspace constraints acts as a temporal regularization term. In practice, our approach is equivalent to Zach et al. (2007) in the degenerate case where the identity matrix is chosen as the motion basis.

We take advantage of the high level of parallelism inherent to our approach by developing a GPU implementation using the Nvidia CUDA framework. This parallel implementation vastly outperforms the equivalent Matlab code.

Additionally, we provide an extension of our multi-frame approach to the case of vector-valued images which allows us to use the information from all colour channels in image sequences, and further improve results. Our novel optimization scheme is based on the dualization of the linearized data term. Unlike Râket et al.’s previous attempt to extend TV- Inline graphic flow to vector valued images (Rakêt et al. 2011), our new algorithm is not restricted to the use of the -norm penaliser and instead allows the use of more general convex robust penalizers in the data term.

Currently, there are no benchmark datasets for the evaluation of optical flow that include long sequences of non-rigid deformations. In particular, the most popular one (Baker et al. 2011) (Middlebury) does not incorporate any such sequences. To facilitate the quantitative evaluation of multi-frame non-rigid registration and optical flow and to promote progress in this area, we provide a new dataset based on motion capture data of a flag waving in the wind, with dense ground truth optical flow.

Our quantitative evaluation on this dataset using different motion bases shows that our proposed approach improves on state of the art algorithms including large displacement (Brox and Malik 2011) and duality based (Zach et al. 2007) optical flow algorithms and the parametric dense non-rigid registration approach of Pizarro and Bartoli (2010).

Multi-frame Image Registration

Consider a video sequence of non-rigid objects moving and deforming in 3D. In the classical optical flow problem, one seeks to estimate the vector field of image point displacements independently for each pair of consecutive frames. In this paper, we adopt the following multi-frame reformulation of the problem. Taking one frame as the reference template, typically the first frame, our goal is to estimate the 2D trajectories of every point visible in the reference frame over the entire sequence, using a multi-frame approach (Fig. 1 illustrates our approach). The use of temporal information in this way allows us to predict the location of points not visible in a particular frame making us robust to self-occlusions or external occlusions by other objects.

Low-Rank Trajectory Space

To solve the multi-frame optical flow problem, we make use of the fact that the 2D image trajectories of points on an object are highly correlated, even when the object is deforming. We model this property by assuming that the trajectories lie near a low-dimensional linear subspace. This assumption is analogous to the non-rigid low-rank shape model, first proposed by Bregler et al. (2000), which states that the time varying 3D shape of a non-rigid object can be expressed as a linear combination of a low-rank shape basis. This rank constraint has been successfully exploited for 3D reconstruction by Non-Rigid Structure from Motion (NRSfM) algorithms (Torresani et al. 2008) where the matrix of 2D tracks is factorized into the product of two low-rank matrices: a motion matrix that describes the camera pose and time varying coefficients and a shape matrix that encodes the basis shapes.

The low-rank shape basis model of Bregler et al. (2000), Torresani et al. (2008) exploits the spatial properties of non-rigid motion, introducing rank constraints on the 3D location of the set of points (shape) at any given frame. Interestingly, the dual formulation of this model states that the rank constraint can be instead applied to the trajectories of each individual point, modelling them as a linear combination of basis trajectories. Therefore, the motion and shape matrices can exchange their roles as basis and coefficients and we can either interpret the 2D tracks as the projection of a linear combination of 3D basis shapes or as the linear combination of a 2D motion basis. This concept of non-rigid trajectory basis was first introduced in 2D by Torresani and Bregler (2002) who applied it to non-rigid 2D tracking as an extension of the rigid subspace constraints proposed by Irani (2002). Later Akhter et al. (2008, 2011) extended the trajectory basis to 3D to model non-rigid 3D trajectories using the Discrete Cosine Transform (DCT) basis.

Dense Trajectory Subspace Constraints

This paper extends the use of 2D trajectory subspace constraints to the case of estimating dense multi-frame optic flow using a variational approach.

More precisely, we assume that the input image sequence has Inline graphic frames and the -th frame, has been chosen as the reference. We denote by the image domain and we define the function:

that represents the point trajectories in the following way. For every visible point Inline graphic in the reference image, is its discrete-time 2D trajectory over all frames of the sequence. The coordinates of each trajectory are expressed with respect to the position of the point at which means that and that the location of the same point in frame is We use the term multi-frame optical flow to describe Inline graphic since it corresponds to a multi-frame extension of the conventional optical flow: the latter is given by in the degenerate case where the sequence contains only frames and the first one is considered as the reference ().

Mathematically, the robust linear subspace constraint on the 2D trajectories Inline graphic can be expressed in the following way. For all and :

which states that the trajectory Inline graphic of any point can be approximated as the linear combination of basis trajectories that are independent from the point location. We include a modeling error which will allow us to impose the subspace constraint as a penalty term.Normally the values of are relatively small, yet sufficient to improve the robustness of the multi-frame optical flow estimation.

Note that we consider that the chosen trajectory basis is orthonormal. We refer to the linear span of these basis trajectories as a trajectory subspace and denote it by Inline graphic The linear combination is controlled by coefficients that depend on therefore we can interpret the collection of all the coefficients for all the points as a vector-valued image Figure 3 illustrates the subspace constraint.

Fig. 3 — The displacement of any point throughout the sequence can be expressed in a compact way as a linear combination of a low-rank trajectory basis. The basis vectors encode the temporal information while the coefficient maps describe the spatial distribution of the individual basis trajectories

In many cases, effective choices for the model order (or rank) Inline graphic correspond to values smaller than which means that the above representation is compact and achieves a significant dimensionality reduction on the point trajectories.

We now re-write equation (2) in matrix notation, which will be useful in the subsequent presentation. Let Inline graphic and be equivalent representations of the functions and that are derived by vectorizing the dependence on the discrete time and let be the trajectory basis matrix whose columns contain the basis elements after vectorizing them in the same way:

The subspace constraint (2) can now be written as follows:

Non-Rigid Video Registration from Multi-frame Optical Flow

Let Inline graphic be the sequence of grayscale image frames, which are given either directly from the input frames or from the input frames after some preprocessing, such as structure-texture decomposition (Wedel et al. 2009).

In our formulation, the estimation of the multi-frame optical flow is equivalent to the simultaneous registration of all the frames with the reference frame Inline graphic : Recall that for every frame the coordinates yield the current location of any image point of the reference. Therefore, the image:

is the registered version of the image Inline graphic back to the reference or in other words it is the warping of the image to the image As it will be described later, we expect that the brightness differences between every registered image and the reference image to be small and therefore we use an appropriate brightness constancy term in our proposed energy.

Variational Multi-frame Optical Flow Estimation

In this section we show how dense motion estimation can be combined with the trajectory subspace constraints described in Sect. 3. In order to estimate the 2D trajectories of all the points, or equivalently simultaneously register all the frames with the reference frame Inline graphic we propose the following energy:

where

We minimize this energy jointly with respect to the point trajectories Inline graphic and their components on the trajectory subspace that are determined by the linear model coefficients We also add the constraint that since this corresponds to the flow from the reference image frame to itself. The positive constants and weigh the balance between the terms of the energy. Also, Inline graphic in (9) denotes the Huber norm of a vector and is a space-varying weighting function (see Sect. 4 for more details).

Note that the functions Inline graphic and determine two sets of trajectories that are relatively close to each other but not identical since the subspace constraint is imposed as a soft constraint.This improves the robustness of our method against overfitting to the image data in cases where the brightness constancy assumption fails. For this reason, we consider that the final output of our method are the trajectories Inline graphic that lie on the trajectory subspace and are directly derived by the coefficients

Description of the Energy

In this section we provide more details about the properties of the proposed energy (6).

The first term ( Inline graphic ) is a data attachment term that uses the robust -norm and is a direct multi-frame extension of the brightness constancy term used by most optical flow methods, e.g. Zach et al. (2007). It is based on the assumption that the image brightness at every pixel of the reference frame is preserved at its new location, Inline graphic in every frame of the sequence. The use of an -norm improves the robustness of the method since it allows deviations from this assumption, which might occur in real-world scenarios because of noise, illumination changes or occlusions of some points in some frames.

The second term ( Inline graphic ) penalizes the difference between the two sets of trajectories and and acts as a coupling (linking) term between them. This term serves as a soft constraint that the trajectories should be relatively close to the subspace spanned by the basis Concerning the weight the larger its value the more restrictive the subspace constraint becomes. Since the subspace of Inline graphic is low-dimensional, this constraint operates also as a temporal regularization that is able to perform temporal filling-in in cases of occlusions or other distortions.

An equivalent interpretation is that this term is derived from the constraint that the error Inline graphic in (2) has a bounded norm, i.e. for some appropriate constant Then corresponds to the Lagrange multiplier for this constraint.

The third term ( Inline graphic ) corresponds to the spatial regularization of the trajectory coefficients. This term penalizes spatial oscillations of each coefficient caused by image noise or other distortions but not strong discontinuities that are desirable in the borders of each object. In addition, this term allows to fill in textural information into flat regions from their neighbourhoods. Following Werlberger et al. (2009), Newcombe et al. (2011), we use the Huber norm over the gradient of each subspace coefficient Inline graphic which is defined as:

where Inline graphic is a relatively small constant. The Huber norm is a convex differentiable function that combines quadratic regularization in the interval with Total Variation regularization outside the interval.For small gradient magnitudes the Huber norm offers smooth solutions, whereas for larger magnitudes the discontinuity preserving properties of Total Variation are maintained. Following Alvarez et al. (1999), Wedel et al. (2009), Newcombe et al. (2011), we also incorporate a space-varying weight Inline graphic that depends on the reference image as follows:

where Inline graphic is a constant and is the standard deviation of the 2D Gaussian that convolves the reference image This weight encourages discontinuities in flow to coincide with edges of the reference image by reducing the regularisation strength near those edges.Further discussion on our proposed regularization term Inline graphic is provided in Sect. 4.

Connections to Previous Work

Interestingly, our adopted strategy of estimating two sets of trajectories, Inline graphic and resembles the techniques of quadratic relaxation and duplication of the optimization variable that have been previously used in the context of optical flow and depth map estimation (Zach et al. 2007; Pock et al. 2010; Stuehmer et al. 2010; Newcombe et al. 2011). Similarly, we benefit from the fact that the optimization problem can be decomposed into two parts, each of which is a convex energy2 that can be solved efficiently and in a globally optimal manner. However, our formulation offers an additional advantage: the spatial regularization step, which is the bottleneck in these optimization schemes, is computationally much more efficient since it is applied to the coefficients Inline graphic that normally have smaller dimensionality than the flow

Note that there is a degenerate case in which our proposed approach becomes equivalent to independently estimating the flow from the reference Inline graphic to each frame by applying times the ITV- optical flow algorithm (Wedel et al. 2009). This degenerate case occurs when:

The motion basis is set to where is the identity matrix, in which case ; and
and

When Inline graphic and the terms become equivalent to and therefore our regularization term is a summation of Total Variation terms. Furthermore, the choice converts the energy (6) into a summation of decoupled energy terms :

Each term Inline graphic corresponds to a specific frame and depends only on and the two coefficients and These coefficients stacked together as a vector-valued function can be seen as the auxiliary variable of so the energy term is equivalent to the convex relaxation of the TV- functional used in Wedel et al. (2009).

Effective Trajectory Regularization

In this section we provide further intuition into our choice of multi-frame optical flow regularization Inline graphic The presentation of this section follows a constructive approach—we build our proposed regularizer from the simplest choice of regularization term in successive steps, each of which adds more complexity but improves its effectiveness. We start by revisiting common practices in the literature and conclude by proposing our novel anisotropic trajectory regularization term in the final step. Our goal is to regularize the multi-frame optical flow Inline graphic that lies on the trajectory subspace. Note that can be interpreted as a vector valued function with channels encoding the horizontal and vertical components of the optical flow at each frame as defined in equation (3).

Step 1. A simple choice would be to use homogeneous regularization of Inline graphic which is a straightforward multiframe generalization of the model of Horn and Schunck (1981):

where Inline graphic denotes the Frobenius norm of a matrix and is the Jacobian of (each row contains the gradient of the corresponding channel of ). However, this regularizer leads to oversmoothing on the motion boundaries since the quadratic term excessively penalizes large magnitudes of the gradients of Inline graphic which correspond to motion discontinuities.

Step 2. A way to avoid this is by applying a robust function Inline graphic that penalizes outliers of the gradient less severely than the quadratic penalizer:

This choice is used in Nir et al. (2008) and when only two frames are taken into account it is equivalent to the regularizers used in Schnörr (1994), Weickert (1998), Brox and Malik (2011) (isotropic flow-driven regularization in the terminology of Weickert and Schnörr (2001a)). Some examples of the robust function Inline graphic include the following:

in which case the regularizer is the vectorial total variation (Sapiro 1997) of the vector-valued function that encodes the multi-frame optical flow.
or the Huber norm (10), which is the choice adopted in our approach.

The robust function Inline graphic in (14) penalizes outliers of the norm less strongly, therefore allows discontinuities to occur at However, such outliers correspond only to the points where all the channels of display sharp discontinuities. If for example only few channels of have a high gradient at a point then Inline graphic is not treated as an outlier, since it is still low (because of the sum of squares over all channels that is involved in this norm). This regularizer is thus much less tolerant to motion boundaries that occur at individual channels.

Step 3.

The above problem can be addressed by applying the penalizer Inline graphic independently to the squared norm of the gradient of each channel of :

This is a direct multi-frame extension of the regularizer used in Deriche et al. (1995), Kumar et al. (1996), Aubert et al. (1999), Zach et al. (2007), Wedel et al. (2009) for which efficient numerical implementations exist (Zach et al. 2007; Wedel et al. 2009). In this way, each channel of Inline graphic can have different boundaries. However, this regularizer is on the other extreme of the regularizer of Step 2: where substantial correlation between the different channels exists, it is ineffective since it allows correlated trajectories to have different boundaries.

In addition, in contrast to the regularizers proposed in previous steps, it is not rotation invariant (Weickert and Schnörr 2001a).

Step 4.

To avoid the aforementioned problems, we adopt our subspace model for the 2D trajectories Inline graphic and rewrite the norm as a function of the coefficients:

where we have used the property of orthonormality of the basis Inline graphic Provided that the trajectory basis has been chosen appropriately, the coefficients are much less correlated than the channels of We conclude that it is more effective to apply the robust function independently to the basis coefficients (instead of the flow fields) and we derive the regularizer:

Furthermore, this regularizer leads to a much more efficient implementation for two main reasons. First, the resultant regularization is applied to the coefficients Inline graphic that typically have lower dimensionality than the flow Second, this regularization is decoupled for each coefficient and can thus be highly parallelized. Note that the regularizer (15) derived in Step 3 can be considered as a special case of the above regularizer when the identity matrix is chosen as the basis Inline graphic However, in our work, we use two choices for : DCT and PCA (derived from an initial flow). We now analyze each of these cases separately:

When the basis matrix has been estimated by applying PCA to some trajectory samples, the correlation between the coefficients can be considered negligible. Furthermore, in this case we regain the desirable property of rotation invariance, since the proposed regularizer (17) is consistent with the general design principle of Weickert and Schnörr (2001a) for rotationally invariant anisotropic regularizers. According to that principle3, given an appropriate decomposition of where are rotationally invariant expressions, one should use the regularizer which is rotationally invariant and anisotropic. In our case, the expressions correspond to the coefficients which are indeed rotation invariant: If we assume that a rotation of the input frames causes the same rotation to be applied to the trajectory samples, then the basis trajectories will be equally rotated. Therefore, the coefficients of a specific reference image point 4 will remain invariant and the corresponding trajectory will simply be rotated.
In the case of the DCT basis, the above properties do not hold. However, the regularizer (17) with a DCT basis is much more effective than the regularizer (15), since the DCT frequency components of a trajectory are typically less correlated than its actual coordinates. This is due to the fact that when the actual motions of the image points are compositions of different physical motions, these motions are expected to be much more localized in the frequency domain rather than in the time domain.

Step 5.

Finally, it is reasonable to assume that the boundaries of all the motion components tend to be a subset of the edges on the reference image. Following Alvarez et al. (1999), Wedel et al. (2009), Newcombe et al. (2011), in order to prevent any smoothing along the motion boundaries our final regularizer Inline graphic is weighted by a space-varying function that depends on the reference image as described in (11).

In our extensive experiments, we have empirically evaluated that the introduction of such a weighting improves the accuracy of the multiframe optical flow. This is in accordance with the experimental evidence reported in Wedel et al. (2009) for the classical optical flow.

Optimization of the Proposed Energy

In order to minimize the energy (6), we follow a coarse-to-fine technique with multiple warping iterations (Brox et al. 2004). In every warping iteration, we use an initialization Inline graphic that comes from the previous iteration. We approximate the data term (7) by linearizing the image around After this approximation, the energy (6) becomes convex.

Following Zach et al. (2007), we implement the optimization of the energy (6) using an alternating approach. We decouple the data and regularization terms to decompose the optimization problem into two, each of which can be more easily solved. In this section we show how to adapt the method of Zach et al. (2007) to our problem, to take advantage of its computational efficiency and apply it to multi-frame subspace-constrained optical flow. The key difference to Zach et al. (2007) is that we do not solve for pairwise optical flow but instead we optimize over all the frames of the sequence while imposing the trajectory subspace constraint as a soft constraint.

We apply an alternating optimization, updating either Inline graphic or in every iteration, as follows:

Repeat until convergence: Minimization Step 1: For fixed, update by minimizing w.r.t. Minimization Step 2: For fixed, update by minimizing w.r.t.

Convergence is declared if the relative update of Inline graphic and is negligible according to some appropriate distance threshold. Since at every step the value of the energy does not increase and this value is bounded below by its global minimum, the above alternation is guaranteed to converge to a global minimum point.

Minimization Step 1

Since in this step we keep Inline graphic fixed, we observe that only the last two terms, and of the energy (6) depend on Therefore we must minimize with respect to Using the matrix notation defined in (4), we can write the term as:

Let Inline graphic be an matrix whose columns form an orthonormal basis of the orthogonal complement of the trajectory subspace Then the block matrix is an orthonormal matrix, which means that its columns form a basis of Consequently, can be decomposed into two orthonormal vectors as

where

are the coefficients that define the projections of Inline graphic onto the trajectory subspace and its orthogonal complement. Equation (18) can now be further simplified:

due to the orthonormality of the columns of Inline graphic and (which makes the corresponding transforms isometric) and Pythagoras’ theorem. The component is constant with respect to ; therefore it can be ignored from the current minimization. In other words, with being fixed and lying on the linear subspace penalizing the distance between Inline graphic and is equivalent to penalizing the distance between and the projection of onto

Thus, the minimization of Step 1 is equivalent to the minimization of:

where Inline graphic is the -th coordinate of We have finally obtained a new form of the energy that allows the trajectory model coefficients to be decoupled. The minimization of each term in the above sum can be done independently and corresponds to a small modification of the TV- Rudin-Osher-Fatemi (ROF) model (Rudin et al. 1992) applied to each coefficient Inline graphic : This modification consists of incorporating an edge weighting and replacing the norm with the Huber norm This modified ROF model has been recently studied in Newcombe et al. (2011) for the problem of depth estimation.The optimum is actually a regularized version of and the extent of this regularization increases as the weight Inline graphic decreases.

The benefits of the computational efficiency of the above procedure are twofold. First, these independent minimizations can be parallelized. Second, several efficient algorithms exist to implement such regularization models. Appendix A describes the actual algorithm we used for the optimization of this energy, which is related to the method proposed in Newcombe et al. (2011).

Minimization Step 2

Keeping Inline graphic fixed, we observe that only the first two terms of the energy (6), and depend on and therefore we have to minimize with respect to the following:

where Inline graphic This cost depends only on the value of on the specific point and the discrete time (and not on the derivatives of ). Therefore the variational minimization of Step 2 is equivalent to the minimization of a bivariate function of for every spatiotemporal point independently.

We implement this point-wise minimization by applying the technique proposed in Zach et al. (2007) to every frame. More precisely, for every frame Inline graphic and point the image is linearized around where are the initializations of the trajectories The function to be minimized at every point will then have the simple form of a summation of a quadratic term with the absolute value of a linear term. The minimum can be easily found analytically using the thresholding scheme reported in Zach et al. (2007).

Derivation of the Trajectory Basis

Concerning the choice of 2D trajectory basis Inline graphic we consider orthonormal bases as it simplifies the analysis and calculations in our method (see Sect. 4). Of course this assumption is not restrictive, since for any basis an orthonormal one can be found that will span the same subspace. We now describe several effective choices of trajectory basis that we have used in our formulation.

Predefined bases for single-valued discrete-time signals with Inline graphic samples can be used to model separately each coordinate of the 2D trajectories. Assuming that the rank is an even number, this single-valued basis should have elements and the trajectory basis would be given by:

Provided that the object moves and deforms smoothly, effective choices for the basis Inline graphic are (i) the first low-frequency basis elements of the 1D Discrete Cosine Transform (DCT) or (ii) a sampling of the basis elements of the Uniform Cubic B-Splines of rank over the sequence’s time window, followed by orthonormalization of the yielded basis. The obvious advantage of using a predefined basis is that it does not need to be estimated in advance.

An alternative is to estimate the basis by applying Principal Component Analysis (PCA) to some sample trajectories. Provided that it is possible to estimate a set of sample trajectories that adequately represent the trajectories of the points over the whole object, the choice of the PCA basis is optimum for the linear model of a given rank Inline graphic in terms of representational power. In this work we consider two possibilities.

(i)
The sample trajectories could come from an initial estimate of optical flow. We have found that the flow obtained using the DCT basis provides a very good initial flow on which we then apply PCA to obtain an optimized basis.
(ii)
Alternatively, the sample trajectories could be a small subset of reliable point tracks, which we consider to be those where the texture of the image is strong in both spatial directions and can be selected using Shi and Tomasi’s criterion (Shi and Tomasi 1994). However, this option is not resilient to outliers.

In practice, in our experimental evaluation section we show that the multi-frame optical flow obtained with the optimized PCA basis proposed in (i) provides the best results. It has the added advantage that, since we initialize the flow from our algorithm using the DCT basis, which is predefined and needs not be estimated, the entire process is automated and less affected by outliers.

Generalization to Sequences of Vector-Valued Images

The algorithm we have described so far assumes that the images in the sequence are grayscale. In this section we develop a generalization of our approach to the case of sequences of vector-valued images. We propose an optimization scheme that is based on the dualization of the data term of the energy.

The use of vector-valued images can significantly improve the accuracy of the estimated optical flow for various reasons. First of all, the vector-valued images can incorporate all the color channels of an image. The color cue in a video offers important additional information and resolves ambiguities that are present in the grayscale images. Furthermore, this generalization offers the potential for incorporating other powerful image cues as additional channels. For instance, the spatial derivatives of the color channels can be added to impose the gradient constancy assumption (Uras et al. 1988; Brox et al. 2004; Papenberg et al. 2006; Brox and Malik 2011) or even more complex features such as SIFT (Liu et al. 2011) features or others derived using a Field-of-Experts formulation (Sun et al. 2008), which can improve the robustness against illumination changes of the scene. Note that in our experimental evaluation we have only incorporated the color channels. To cope with illumination changes we have used structure-texture decomposition as a preprocessing step, which is an alternative way to gain robustness (Wedel et al. 2009).

Proposed Dual Formulation

Let us assume that the video frames that are used in our data term are vector-valued images with Inline graphic channels:

To cope with this more general case, we only have to modify two elements of the formulation of our energy: (i) the data term Inline graphic of the proposed energy (6) and (ii) the edge-weighting function of the regularization term described in (11) that depends on the reference image.

The original definition of the function Inline graphic is based on the term used as a simple edge-strength predictor. For vector-valued images, we use a common and natural extension of this predictor (Blomgren and Chan 1998; Tschumperlé and Deriche 2005) by adding the contributions of the different image channels. We thus generalize the edge-weighting function as follows:

Concerning the data term Inline graphic we also make a further generalization by applying a generic robust function 5 to the image differences:

Our generalized data term becomes:

Since only the data term is affected by the extension to vector-valued images, the optimization of our proposed energy (6) only requires a modification of the minimization of Inline graphic with respect to (Step 2 in Sect. 5). Similarly to the case of grayscale images, this minimization is independent for every spatio-temporal point But the point-wise energy that must be minimized with respect to is now the following:

For every point Inline graphic in every frame each channel of is linearized around where are the initializations of the trajectories With this approximation, can be written as:

where Inline graphic and is the (spatial) Jacobian of the -th frame evaluated at

Assuming that the function Inline graphic is proper convex and lower semi-continuous, we dualise it by using its convex bi-conjugate (Rockafellar 1997; Chambolle and Pock 2011):

where, Inline graphic is the Legendre-Fenchel transform of and is the dual variable to We can now rewrite the energy (29) as:

Based on the above expression, we propose to minimise Inline graphic by solving the following saddle point problem:

where

Given a specific choice for the robust function Inline graphic one can derive efficient algorithms to solve the saddle point problem (32), using a similar framework as in Esser et al. (2010), Chambolle and Pock (2011), Pock and Chambolle (2011). In Appendix B we provide such algorithms for two special cases of of particular interest:

which leads to - norm of the image differences in (28). This is the choice that we use in our experiments on colour images.
which corresponds to the Huber norm (10).

Note that Rakêt et al. (2011) recently proposed an extension of the TV- Inline graphic algorithm for vector-valued images. Their method corresponds to the choice and uses a step of projection onto an elliptic ball. The formulation that we propose in this section can be seen as an alternative to the aforementioned work. The advantage of our approach is that it allows the use of more general robust functions Inline graphic

Implementation Details

In this section we provide details about the implementation of the numerical optimization schemes for our grayscale and vector-valued multi-frame subspace optical flow algorithms.

We used a similar numerical optimisation scheme and preprocessing of images6 to the one proposed in Wedel et al. (2009) to minimise the energy (6), i.e. we use the structure-texture decomposition to make our input robust to illumination artifacts due to shadows and shading reflections. We also used blended versions of the image gradients and a median filter to reject flow outliers. Concerning the choice of parameters, the default values proposed in Wedel et al. (2009) for the ITV- Inline graphic algorithm were found to give the best results for ITV- and our method on the benchmark sequence (5 warp iterations, 20 alternation iterations and the weights and were set to 30 and 2). The same settings were used in all our experiments on real sequences. Note that when we ran the colour version of our algorithm we downweighed the value of Inline graphic by a factor of to account for the three colour channels. Regarding the parameters of the space varying weight of the regularization term defined in (11), we used the following values: pixel, and

Since our algorithm can be efficiently parallelized on standard graphics hardware we have developed a GPU implementation using the CUDA framework. We run our algorithm on an NVIDIA GTX-580 GPU card hosted on a dual-core CPU. We obtain an average speedup of Inline graphic with respect to our CPU Matlab implementation which runs on a 4 quad-core server with 192Gb of memory.

Reparameterization of the Optical Flow: Hard Subspace Constraint

In the special case where the error Inline graphic in (2) is close to zero everywhere in the image, or equivalently when in (6), our soft constraint becomes a hard constraint and the optical flow can be reparameterized as:

where the coefficients of the motion basis Inline graphic are the unknown variables. In this case the energy for vector valued images with channels can be rewritten as:

where Inline graphic is the matrix i.e. two rows of the basis matrix which correspond to frame Appendix C describes a primal-dual optimization algorithm to minimize this energy obtained via reparameterization of the flow.

A valid question at this point would be: how does this hard subspace constraint compare with respect to our proposed soft constraint? In Sect. 3 we argued that a soft constraint would provide increased robustness. For this reason, in Sect. 10 we have conducted a thorough experimental comparison between the two approaches which in fact reveals that it is indeed beneficial to allow deviations from the subspace constraint. Our robust soft constraint consistently outperforms imposing a hard constraint via reparameterization of the optical flow.

Experimental Results

In this section we evaluate our method and compare its performance with state of the art optical flow (Brox and Malik 2011; Zach et al. 2007) and image registration (Pizarro and Bartoli 2010) algorithms. We show quantitative comparative results on our new benchmark ground truth optical flow dataset and qualitative results on real-world sequences7.

Furthermore, we analyse the sensitivity of our algorithm to some of its parameters, such as the choice of trajectory basis and regularization weight. Since our algorithm computes multi-frame optical flow and incorporates an implicit temporal regularization term, it would have been natural to compare its performance with a spatiotemporal optical flow formulation such as Weickert and Schnörr (2001b). However, due to the lack of publicly available implementations we chose to compare with LDOF (Large Displacement Optical Flow) (Brox and Malik 2011), one of the best performing optical flow algorithms, that can deal with large displacements by integrating rich feature descriptors into a variational optic flow approach to compute dense flow. We also compare against the duality-based ITV- Inline graphic (Improved TV-) algorithm (Wedel et al. 2009), which we use as a baseline since our method can be seen as its generalization to the case of multi-frame non-rigid optical flow via robust trajectory subspace constraints (see Sect. 4). In both cases, we register each frame in the sequence independently with the reference frame. We also compare with Pizarro and Bartoli’s state of the art keypoint-based non-rigid registration algorithm (Pizarro and Bartoli 2010).

Note that all these algorithms can only be used on grayscale images.

Construction of a Ground Truth Benchmark Dataset

For the purpose of quantitative evaluation of multi-frame non-rigid optical flow we have generated a new benchmark sequence with ground truth optical flow data. To the best of our knowledge, this is one of the first attempts to generate a long image sequence of a deformable object with dense ground truth 2D trajectories. We use sparse motion capture (MOCAP) data from White et al. (2007) to capture the real deformations of a waving flag in 3D. This sparse data is interpolated to create a continuous dense 3D surface using the motion capture markers as the control points for smooth Spline interpolation. Figure 4 shows four frames of the (a) sparse and (b) dense interpolated 3D flag surface. This dense 3D surface is then projected synthetically onto the image plane using an orthographic camera. We use texture mapping to associate some texture to the surface while rendering 60 frames of size 500 Inline graphic 500 pixels. We provide both grayscale and colour sequences.The advantage of this new sequence is that, since it is based on MOCAP data, it captures the complex natural deformations of a real non-rigid object while allowing us to have access to dense ground truth optical flow. We have also used three degraded versions of the original rendered sequences by adding (i) Gaussian noise, of standard deviation 0.2 relative to the range of image intensities, (ii) salt & pepper (S&P) noise of density 10% and (iii) synthetic occlusions generated by superimposing some black circles of radius 20 pixels moving in linear orbits. Figure 4 shows four frames of the original colour sequence, the ground truth optical flow and the equivalent frames of the grayscale sequence with: synthetic occlusions, Gaussian noise and salt & pepper noise.

Fig. 4 — Rendering process for ground truth optical flow sequence of a non-rigid object for different images in each row. (a) Sparse surface representing MOCAP data (White et al. 2007), (b) Dense surfaces constructed using thin plate spline interpolation, (c) Ground truth optical flow visualized with the color coding that is shown at (**(d)**) Color sequence rendered from using texture mapping of a graffiti image, (e) Grayscale version of the same sequence with superimposed red disks indicate regions where intensities are replaced by black in the case of synthetic occlusions, (f) Grayscale sequence with synthetic gaussian noise, (g) Grayscale sequence with synthetic salt and pepper noise (Color figure online)

Quantitative Results on Benchmark Sequence

We tested our Multi-Frame Subspace Flow algorithm for grayscale (mfsf) and colour images (mfsf c) using the three different proposed motion basis: PCA, DCT and Cubic B-Spline (Figs. 5, 6). In Table 1, we provide a quantitative comparison of the performance of the different versions of our algorithm, against the state of the art methods listed above, using the four different versions of the rendered flag sequence as input. We report the root mean square (RMS) of the endpoint error, i.e. the amplitude of the difference between the ground truth and estimated flow Inline graphic These measures are computed over all the frames and for all the foreground pixels. Note that the results obtained with the Spline basis were omitted since they were almost equivalent to those obtained with the DCT basis, as Fig. 7a reveals.

Fig. 5 — Inverse warps and error maps for frames () of the original benchmark sequence. Each row shows results for different methods. (**a–b**) Multi-frame subspace flow on color images: (a) mfsf c , (b) mfsf c . (**c–d**) Multi-frame subspace flow on grayscale images: (c) mfsf , (d) mfsf . Against (e) itv-l1, Wedel et al. (2009). (f) LDOF Brox and Malik (2011), (g) Pizarro and Bartoli (2010)

Fig. 6 — (a) RMS flow error vs increasing values of the rank of the different trajectory bases (PCA, DCT, UCBS). The graph shows that the PCA motion basis provides best results and that our algorithm does not overfit when the rank of the basis is overestimated. (b) RMS flow error vs increasing values of the weight of the subspace constraint (c) RMS flow error for increasing value of the rank of the PCA basis on the different variants of the benchmark sequence (occlusions, Gaussian noise, salt & pepper noise). All experiments are for our grayscale multi-frame subspace flow algorithm

Table 1.

RMS endpoint errors in pixels on the benchmark sequences of our proposed method for colour (mfsf c) and grayscale (mfsf) images using different motion basis (PCA, DCT and I Inline graphic )

Image type	Method	Version of input sequence:
		Original	Occlusions	Gauss. noise	S&P noise
Color		0.69	0.80	1.25	1.01
		0.80	1.00	1.52	1.17
		0.75	0.85	1.52	1.18
		0.89	1.12	1.84	1.38
Grayscale		1.13	1.43	1.83	1.60
	ITV- (Wedel et al. 2009)	1.43	1.89	2.61	2.34
	LDOF (Brox and Malik 2011)	1.71	2.01	4.35	5.05
	Pizarro and Bartoli (2010)	1.24	1.27	1.94	1.79

Open in a new tab

We compare the different versions of our grayscale algorithm (mfsf) against state of the art optical flow (ITV- Inline graphic (Wedel et al. 2009), LDOF (Brox and Malik 2011)) and non-rigid registration (Pizarro and Bartoli 2010) methods

Numbers in bold highlight best performing color/grayscale algorithm

Fig. 7 — Flow error maps on the benchmark sequence with synthetic occlusions for frames (). Each column shows results for different methods and errors are displayed as heatmaps. (**a–b**) Multi-frame subspace flow on color images: (a) mfsf c , (b) mfsf c . (**c–d**) Multi-frame subspace flow on grayscale images: (c) mfsf , (d) mfsf . (e) itv- L Wedel et al. (2009). (f) ldof Brox and Malik (2011) (g) Pizarro and Bartoli (2010). It is easy to see from the error maps for frames or that the colour versions of our algorithm (a) mfsf c and (b) mfsf improve substantially on their grayscale counterparts (c) mfsf and (d) mfsf

First we compare the performance of our original algorithm for grayscale images (mfsf) with ITV- Inline graphic (Wedel et al. 2009), LDOF Brox and Malik (2011) and Pizarro and Bartoli (2010), since these algorithms can only be used on grayscale images. We report results for our algorithm using the full rank () DCT basis (mfsf ) and a full rank PCA basis (mfsf ). Note that the PCA basis was estimated using as input the flow obtained after running our algorithm with the DCT basis (mfsf Inline graphic ). We also ran our algorithm using the identity matrix as the basis (mfsf ) to show the degradation of the results when subspace constraints are not applied to compute the multi-frame optical flow.

Table 1 shows that our proposed algorithms (mfsf Inline graphic ) and (mfsf ) rank top amongst the grayscale algorithms, outperforming all other methods and yielding the lowest RMS errors on all the sequences: original, occlusions, Gaussian noise and salt & pepper noise. The best results are obtained using the PCA basis.

Moreover, the top two rows of Table 1 show that using the novel extension of our algorithm to colour images (mfsf c) described in Sect. 7 improves significantly the results in all versions of the sequence. Once more, the results obtained using a full rank PCA basis (mfsf c Inline graphic ) outperform those obtained with the DCT basis (mfsf c ).

Regarding the choice of parameters, as we described in Sect. 8 the default values proposed in Wedel et al. (2009) for the ITV- Inline graphic algorithm were also found to give best results on our grayscale algorithm (mfsf). 8

However, we found that these parameters needed some tuning on the noisy and occluded versions of our benchmark sequence. A lower value of the data term weight Inline graphic was found to provide best results. Additionally, on the noisy sequences, the weight of the quadratic term was lowered to These modified values were used on mfsf , mfsf and mfsf .

Figure 5 shows a visual comparison of the results on the benchmark sequence reported in Table 1. We show a closeup of the reverse warped images Inline graphic of three frames in the sequence () which should look identical to the template frame; and the error in the flow estimation for the same frames, expressed in pixels, encoded as a heatmap. Notice the significant improvements that our proposed algorithms for colour images (mfsf c , mfsf c Inline graphic ) show with respect to their grayscale counterparts (mfsf , mfsf ). Overall, all our approaches outperform state of the art methods: ITV- optical flow (Wedel et al. 2009); LDOF (Brox and Malik 2011) and Pizarro and Bartoli’s registration algorithm (Pizarro and Bartoli 2010).

Figure 7 shows results of the experiments on the benchmark sequence with synthetic occlusions. The error maps Inline graphic for images () encoded as heatmaps are shown for all the variants of our grayscale (mfsf , mfsf ) and colour (mfsf c , mfsf c ) algorithms as well as ITV- (Wedel et al. 2009), LDOF (Brox and Malik 2011) and Pizarro and Bartoli (2010). We notice the same behaviour as in the experiments without occlusions—the error maps obtained with our algorithms show a superior performance with respect to state of the art approaches. Amongst our proposed approaches, one can observe significant improvements of the colour versions over their grayscale equivalents.

Figure 6a shows a graph of the RMS error over all the frames of the optical flow estimated using the 3 different bases for different values of the rank and of the weight Inline graphic associated with the soft constraint. For a reasonably large value of all the basis can be used with a significant reduction in the rank. The optimization also appears not to overfit when the dimensionality of the subspace is overly high. Figure 6c establishes the same fact in the case of noisy images and sequences with occlusions. Figure 6b explores the effect of varying the value of the weight Inline graphic on the accuracy of the optical flow. While low values of cause numerical instability (data and regularization terms become completely decoupled) high values of on the other hand, lead to slow convergence and errors since the point-wise search is not allowed to leave the manifold, simulating a hard constraint. Another interesting observation is that our proposed method with a PCA basis of rank Inline graphic =50, yields a better performance than with a full rank PCA basis =120. This reflects the fact that the temporal regularization due to the low dimensional subspace is often beneficial. Note that to analyze the sensitivity of our algorithm to its parameters in Fig. 6a–c we used ground truth tracks to compute the PCA basis to remove the bias from tracking.

Experimental Comparison of Soft Versus Hard Subspace Constraint

In this section we use the synthetic grayscale flag sequence to conduct an experimental comparison of the optical flow obtained using our proposed soft subspace constraint with that obtained imposing the hard constraint described in Sect. 9. The energy associated with the hard constraint (59) can be obtained by removing the quadratic term Inline graphic from our energy (6) and reparameterizing the optical flow in terms of the trajectory coefficients.

We use the primal-dual algorithm described in Appendix C to minimise the energy obtained via reparameterization (59) with 200 iterations per warp to ensure convergence. We observed that 200 iterations were enough for the convergence of the cost function to a reasonable tolerance (which we consider to be when the change in cost per iteration is Inline graphic th of the total change).

Our energy (6) based on the soft subspace constraint, is minimized using our optimization scheme described in Sect.5. To establish a fair comparison, we used 20 denoising iterations for the regularization step and 20 alternation iterations between the minimisation of Step 1 and Step 2 to ensure convergence.

Table 2 reports the RMS endpoint error, measured in pixels, of the flow obtained with the soft (S) and hard (H) constraints using 3 different basis:

Low rank () PCA basis obtained from sparse tracking using Pizarro and Bartoli (2010).
Full rank PCA basis obtained from ground truth optical flow.
Full rank DCT basis.

The comparative results in Table 2 show that the optical flow obtained with our soft constraint consistently outperforms the flow obtained after reparameterization (hard constraint) in all three experiments on all the different sequences (orginal, noisy and with occlusions). This is particularly the case in the presence of Gaussian noise when the endpoint errors differ most. However, this is to be expected since our soft constraint allows some deviations from the subspace manifold.

Table 2.

RMS endpoint error in pixels for the optical flow obtained with the hard (H) versus soft (S) constraints

Basis	Rank	Constraint	Version of input sequence:
			Original	Occl.	Gauss. noise	S&P noise
Sparse PCA	75	Soft (S)	0.90	1.01	1.80	1.46
		Hard (H)	0.98	1.05	2.22	1.60
GT PCA	120	Soft (S)	0.69	0.76	1.43	1.07
		Hard (H)	0.70	0.77	1.65	1.08
DCT	120	Soft (S)	0.89	1.12	1.83	1.38
		Hard (H)	1.09	1.28	2.00	1.42

Open in a new tab

We carry out 3 experiments using: (top) a low-rank sparse PCA basis (using tracks given by Pizarro and Bartoli (2010)); (middle) a full rank ground truth PCA basis (computed using the ground truth optical flow); and (bottom) a full rank DCT basis. The algorithms were tested on all the different types of sequence (original, noisy and with occlusions)

In the first experiment we used a low rank PCA basis estimated from sparse tracking (obtained using Pizarro and Bartoli’s matching algorithm (Pizarro and Bartoli 2010)) to test the case of an inaccurate basis. This is the case when it is most clearly beneficial to allow deviations from the subspace manifold. This is naturally reflected on significantly higher endpoint errors on the flow computed with the hard constraint compared with that computed with our soft constraint.

It is also interesting to observe that even in the case when we used the full rank PCA basis computed from the ground truth flow the soft constraint performs marginally better than the hard constraint. In the sequence with Gaussian noise it provides a more clear benefit. Finally, the third experiment with a full rank DCT basis also shows that it is beneficial to use a soft constraint in all the different image sequences.

In conclusion, the optical flow obtained using the subspace constraint as a soft constraint consistently outperforms the flow obtained by reparameterization when both algorithms were ran until convergence. The benefits of the soft constraint are stronger when dealing with noisy images and in the case of an inaccurate motion basis which is to be expected.

Experiments on Real Sequences

In this section we provide details about the experiments we have carried out on four video sequences which display large displacements and strong deformations.

Actor sequence

This challenging sequence is a 39 frame long clip from a well known film, acquired at Inline graphic frames per second with images of size pixels. The top two rows of Fig. 8 show frames of this sequence in grayscale and colour. Note that frame was used as the reference frame 9. The bottom four rows in Fig. 8 show comparative results of the inverse warp images (using the computed optical flow to warp the current image back to the reference frame) estimated using the following different versions of our algorithm: mfsf Inline graphic , mfsf , mfsf c , mfsf c . The first two methods work on grayscale images and use the identity matrix and PCA basis as the motion basis respectively while the last two are their equivalent colour versions. Comparing the results of mfsf and mfsf (or mfsf c and mfsf c ) allows us to show the advantages of using subspace constraints (PCA basis) versus not using a temporal model for the trajectories ( Inline graphic basis). We use a full rank PCA basis obtained after applying principal components analysis to an initial flow estimated with our algorithm using the DCT basis.

Fig. 8 — Results on the Actor sequence: (**a–b**) Some frames of the grayscale and colour input sequences. This is a challenging sequence with large displacements and strong deformations. Frame 31 is used as the reference frame. (**c–d**) Inverse warp images comparing two versions of our grayscale algorithm: c without subspace constraints (mfsf ) and (d) with subspace constraints (mfsf c ). (**e–f**) Inverse warp images comparing two versions of our colour algorithm: (e) without subspace constraints (mfsf c ) and (f) with subspace constraints (mfsf c )

The advantages of using subspace constraints are clear. For instance, notice that for grayscale images mfsf Inline graphic failed completely to warp frame while mfsf provides an accurate inverse warp image for the same frame and consistently superior results throughout the sequence. It is also clear that making use of all three colour channels using the extension of our algorithm to vector valued images provides substantial improvements. Both mfsf c Inline graphic and mfsf c outperform their grayscale equivalents. In row (d) of Fig. 8 we have highlighed in red areas where the flow has clearly failed on the grayscale mfsf algorithm but have been correctly warped in its colour version mfsf c .

Notice also that mfsf c Inline graphic copes with the large displacements in frame much better than mfsf . However, just using colour without subspace constraints is not enough to estimate accurate flow. Comparing the bottom two rows of Fig. 8 reveals that using subspace constraints significantly improves results also in the case of colour. In conclusion, the best overall results are obtained with mfsf c Inline graphic , our colour algorithm with subspace constraints using the PCA basis.

Figures 9 and 10 support our claims by showing a grid superimposed on the images to reveal the optical flow in a sparse subset of points. The points on the mouth are highlighted in yellow since that is where most of the deformations occur. Once more, Fig. 9 reveals that the quality of the flow computed using trajectory regularization constraints on grayscale images (mfsf Inline graphic ) is far better than that obtained without using subspace constraints (mfsf ). Notice the complete failure of mfsf on frame Similar conclusions can be drawn from the results on the colour images shown in Fig.10. Notice the improvements particularly on the lips.

Fig. 9 — Results on the grayscale Actor sequence: *Top row* (a) shows some frames of the original grayscale sequence. *Middle* (b) and *bottom* (c) rows compare the optical flow results obtained with two of our proposed grayscale algorithms: (c) with subspace constraints (mfsf ) and (b) without subspace constraints (mfsf ). The flow is visualized with a grid superimposed on the images to reveal the optical flow in a sparse subset of points. Points on the mouth are shown in *yellow* to highlight the results on the area with strongest deformations (Color figure online)

Fig. 10 — Results on the colour Actor sequence: *Top row* (a) shows some frames of the original colour sequence. *Middle* (b) and *bottom* (c) rows compare the optical flow results obtained with two of our proposed colour algorithms: (c) with subspace constraints (mfsf c ) and (b) without subspace constraints (mfsf c ). The flow is visualized with a grid superimposed on the images to reveal the optical flow in a sparse subset of points. Points on the mouth are shown in *yellow* to highlight the results on the area with strongest deformations (Color figure online)

Actress sequence

This Inline graphic frame long clip from the same film shows a close-up of an actress opening the mouth widely. The resolution of the images was pixels. This sequence is similarly challenging to the previous one with very large displacements and deformations. In this case we only ran our best performing method on grayscale images mfsf Inline graphic with subspace constraints using a PCA basis of rank Figure 11 shows the original sequence (top row); the inverse warp images estimated from the optical flow (middle row) and the original images augmented with some texture (bottom row) to simulate a tattoo.

Fig. 11 — Results on the Actress sequence: (a) Some frames of the original grayscale sequence. (b) Inverse warp images obtained with our best performing grayscale method using subspace constraints (mfsf ). (c) Original images augmented with some texture to simulate a tattoo

Paper bending-1 sequence

Figure 12 shows results on a sequence of textured paper bending smoothly (Bartoli et al. 2008); a challenging sequence due to its length ( Inline graphic frames) and the large camera rotation. We show results comparing our best performing grayscale algorithm (mfsf ) against state of the art optical flow methods (ITV- (Wedel et al. 2009), LDOF (Brox and Malik 2011)). For completeness in our experimental evaluation, in this case we computed the motion basis by applying PCA to KLT tracks (Lucas and Kanade 1981) keeping the first 12 components. We ran the LDOF and ITV- Inline graphic algorithms using a multi-resolution scaling factor of 0.95, whereas for our algorithm the value 0.75 was sufficient (pointing to faster convergence). Comparing the warped images we observe that our method yields a significant improvement on the accuracy of the optical flow, especially after some frames (see e.g. the artifacts annotated by the red ellipses in the results of LDOF and ITV- Inline graphic ). We show an alternative visualization of the same results with a grid superimposed on the images to reveal the optical flow in a sparse subset of points. This visualization helps to highlight the superiority of the optical flow estimated with our algorithm (mfsf ) with respect to others.

Fig. 12 — Results on Paper Bending-1 grayscale sequence: Comparativev results of the optical flow estimated with our best performing grayscale algorithm (mfsf ) against state of the art optical flow methods (ITV-(Wedel et al. 2009), LDOF (Brox and Malik 2011)). We show two visualizations of the optical flow estimated with the three methods in alternate rows: (i) the inverse warped images and (ii) a grid superimposed on the images to reveal the optical flow in a sparse subset of points. *Top row* shows some frames of the original sequence

In Fig. 13 we show results on the colour version of this sequence, subsampled taking every fifth frame to give a Inline graphic frame long sequence. In this case, we augment the images with new texture using the optical flow results given by our colour multi-frame subspace algorithm using a PCA basis (mfsf c ). In this case we use a full rank PCA basis obtained after applying principal components analysis to an initial flow estimated with our algorithm using the DCT basis (mfsf c Inline graphic ).

Paper bending-2 sequence

Figure 14 shows a Inline graphic frame long grayscale sequence introduced in Varol et al. (2009) of a paper being bent backwards which is widely used for 3D reconstruction in non-rigid structure from motion (NRSfM). Our method used a PCA basis of rank obtained from KLT tracks. The th frame is used as the reference. Once more, we compare results of our algorithm (mfsf Inline graphic ) against the same state of the art approaches as in previous experiments. The inverse warped images and the colour coded optical flow in Fig. 14 reveal that despite having used a very low rank PCA motion basis, our results outperform LDOF and provide more accurate flow boundaries than ITV- Inline graphic

Fig. 14 — Results on the Paper bending-2 sequence: *Top row* shows some images of this grayscale sequence. The 30th frame is used as the reference. Next rows show inverse warp images and colour coded optical flow comparing our best performing grayscale algorithm (mfsf ) using a very low rank PCA decomposition () against state of the art optical flow methods (ITV- (Wedel et al. 2009), LDOF (Brox and Malik 2011))

Conclusions

We have provided a new formulation for the computation of multi-frame optical flow exploiting the high correlation between 2D trajectories of points in a long sequence by assuming that these lie close to a low dimensional subspace. Our main contribution is to formulate the manifold constraint as a soft constraint which, using variational principles, leads to a robust energy that can be efficiently optimized. We propose a new anisotropic trajectory regularization term that acts on the coefficients of the trajectory basis. We take advantage of the high level of parallelism inherent to our approach by developing a GPU implementation using the Nvidia CUDA framework. We also provide an extension of our approach to the case of vector-valued images which allows us to exploit all three colour channels and gain substantial improvements in the accuracy of the estimated optical flow. We also provide a new benchmark dataset, with ground truth optical flow. Our experimental results on the benchmark dataset and on real video footage reveal that using subspace constraints significantly improves results. Our approach outperforms state of the art optical flow and non-rigid registration algorithms.

Acknowledgments

This work is supported by the European Research Council under ERC Starting Grant agreement 204871-HUMANIS. We thank T. Collins for his texture mapping code and D. Pizarro for providing results of their method (Pizarro and Bartoli 2010) and tracks for the synthetic sequence. We also thank A. Handa and L. Pizarro for fruitful discussions.

Open Access

This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Appendix A: Primal Dual Algorithm for Denoising

This appendix describes the optimization of the energy minimized in Step 1 of our algorithm as defined in (22):

which corresponds to a small modification of the TV- Inline graphic Rudin-Osher-Fatemi (ROF) model (Rudin et al. 1992), as described in Sect. 5.1. Note that as the trajectory model coefficients in (22) are decoupled for each in the following derivation we have dropped the subscript for simplicity.

The first step in the optimization is the dualisation of the weighted Huber functional Inline graphic of the above energy with respect to the gradient using its Legendre-Fenchel transform (Rockafellar 1997). After spatial discretization, the minimisation of (36) is equivalent to the following saddle point problem:

where Inline graphic is the set of image grid points, denotes the discrete gradient operator as defined in Chambolle and Pock (2011), are the dual variables for every and is the indicator function of the unit ball:

The problem (37) can be considered as a special case of the following general form of primal-dual problems that are studied in Chambolle and Pock (2011):

In the case of (37), the norm of the linear operator Inline graphic is bounded by Also, both and are uniformly convex with convexity parameters and respectively.

Therefore, we solve (37) by applying Algorithm 3 of Chambolle and Pock (2011). The steps of the algorithm can be written as follows :

Initialize
Iterate for until a convergence criterion is satisfied:

where Inline graphic is the descrete divergence operator and the operator projects a vector onto the unit ball as:

We choose the following values for the steps Inline graphic that guarantees the convergence:

Appendix B: Primal Dual Algorithm for Robust Vector-Valued Image Matching

This appendix provides the details of the algorithm to optimise the saddle point problem (32) for vector-valued images using Euclidean norm and Huber penalisers.

Euclidean Norm Penaliser

This case corresponds to Inline graphic and is a straightforward extension of the absolute value of image differences that we used for in (7) for grayscale images. After dualisation, (32) can be written as:

This problem is also a special case of the general saddle point problem (39) with the linear operator Inline graphic Since the function is uniformly convex with convexity parameter we apply Algorithm 2 of Chambolle and Pock (2011) and derive following optimisation algorithm:

Choose
Initialize from the previous alternation iteration.
Initialize
Iterate for until a convergence criterion is satisfied:
46

47

48

49

where Inline graphic can be any upper bound on the norm of Although the saddle point problem is minimised separately for each spatio-temporal point of the video and is spatially varying, for simplicity we choose a common upper bound on the linear operator for all the points. It can be shown that as defined below is a valid upper bound.

where Inline graphic are the horizontal and vertical coordinate axes of the image plane.

Huber Penaliser

When the robust function used in the data term of the energy for vector-valued images is the Huber norm: Inline graphic the saddle point problem (32) can be written as:

This problem is again of the form (39) with the linear operator Inline graphic The corresponding and functions are both uniformly convex with parameters and We thus solve (51) using Algorithm 3 of Chambolle and Pock (2011) and derive the following optimisation algorithm:

Initialize from the previous alternation iteration.
Initialize
Iterate for until a convergence criterion is satisfied:
52

53

54

We choose the following step-sizes which ensure the convergence of our algorithm:

where Inline graphic is, again, any upper bound on the operator norm of As in the case of Euclidean norm penalisation, we choose as defined in (50).

Appendix C: Optimization of the Hard Subspace Constraint

This appendix describes the optimization of the energy

which corresponds to the case when the subspace constraint is imposed as a hard constraint and the 2D flow Inline graphic can be reparameterized as First, each image channel of is linearised around using an initial estimate Under this approximation the data term can be written as:

where, for every spatio-temporal point Inline graphic

is the Inline graphic Jacobian matrix and is a dimensional vector.

Thus, the following minimization problem must be solved:

where Inline graphic is the linearised color constancy. After dualisation of the data and regularisation terms and spatial discretization, the minimisation (59) is equivalent to the following saddle point problem:

where Inline graphic and are the dual variables for every and respectively.

The energy (60) can be considered as a special case of the general form of primal-dual problem (39) where the linear operator Inline graphic is the dimensional matrix:

where Inline graphic are the image grid points and

Thus, we solve (52) by applying Algorithm 1 of Chambolle and Pock (2011). In this case, the steps of this algorithm can be written as follows :

Initialize
Initialize
Iterate for until a convergence criterion is satisfied:

We use the following step-sizes, which guarantee the convergence of this algorithm too:

Inline graphic is the following upper bound on the operator norm of (61):

where Inline graphic is given by (50).

Footnotes

The parametric warp functions used in this work include Thin Plate Spline (TPS) and Free-Form Deformations (FFD) based on 2D Cubic B-Splines.

After the linearization of the brightness constancy term

In Weickert and Schnörr (2001a) this design principle is expressed for the classical optical flow case where the input is a single pair of frames, but here we present its straight-forward extension to the case of multiple frames.

⁴

By specific reference point we mean that we associate the new location (after rotation) of a point on the reference image with its original location.

⁵

Note that, for the sake of clarity in our presentation, the generic robust function Inline graphic defined here differs from the robust function that we used in Sect. 4: is applied directly to the vectorial differences whereas is applied to their squared norms. The two definitions are linked by:

⁶

Note that we have normalized the image intensity values to lie between Inline graphic and

⁷

Videos of the results as well as our benchmark dataset can be found on the following URL: http://www.eecs.qmul.ac.uk/~lourdes/subspace_flow

⁸

Note that, as we discussed in Sect. 4, mfsf Inline graphic and ITV- Wedel et al. (2009) are equivalent algorithms and should therefore provide the same results. The difference in the numerical results is due to two factors: (i) in mfsf and (ii) the ITV- algorithm was run with its default parameters and mfsf with the tuned parameters described above.

⁹

We choose the reference frame to be one in which the points we are interested in tracking are all visible and also to reduce the maximum displacements.

Contributor Information

Ravi Garg, Email: rgarg@eecs.qmul.ac.uk.

Anastasios Roussos, Email: troussos@eecs.qmul.ac.uk.

Lourdes Agapito, Email: lourdes@dcs.qmul.ac.uk.

References

Akhter, I., Sheikh, Y., Khan, S., & Kanade, T. (2008). Nonrigid structure from motion in trajectory space. In Neural Information Processing Systems, pp. 41–48.
Akhter I, Sheikh Y, Khan S, Kanade T. Trajectory space: A dual representation for nonrigid structure from motion. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2011;33(7):1442–1456. doi: 10.1109/TPAMI.2010.201. [DOI] [PubMed] [Google Scholar]
Alvarez L., Esclarín J., Lefébure M., Sánchez J. (1999). A PDE model for computing the optical flow. In Proceedings of the XVI Congreso de Ecuaciones Diferenciales y Aplicaciones (pp. 1349–1356). Las Palmas de Gran Canaria, Spain, Sept. 1999.
Alvarez, L., Weickert, J., & Sánchez, J. (Aug. 2000). Reliable estimation of dense optical flow fields with large displacements. International Journal of Computer Vision, 39(1), 41–56.
Aubert G, Deriche R, Kornprobst P. Computing optical flow via variational techniques. SIAM Journal on Applied Mathematics. 1999;60(1):156–182. doi: 10.1137/S0036139998340170. [DOI] [Google Scholar]
Baker S, Scharstein D, Lewis J, Roth S, Black M, Szeliski R. A database and evaluation methodology for optical flow. International Journal of Computer Vision. 2011;92:1–31. doi: 10.1007/s11263-010-0390-2. [DOI] [Google Scholar]
Bartoli, A., Gay-Bellile, V., Castellani, U., Peyras, J., Olsen, S. I., & Sayd, P. (2008). Coarse-to-fine low-rank structure-from-motion. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8.
Blomgren, P., & Chan, T. (1998). Color TV: Total variation methods for restoration of vector-valued images. IEEE Transactioons on Image Processing, 7(3):304–309, Special issue on partial differential equations and geometry-driven diffusion in image processing and analysis. [DOI] [PubMed]
Brand, M. (2001). Morphable models from video. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 456–463.
Bregler, C., Hertzmann, A., & Biermann, H. (2000). Recovering non-rigid 3D shape from image streams. In IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. 690–696.
Brox, T., Bruhn, A., Papenberg, N., & Weickert, J. (2004). High accuracy optical flow estimation based on a theory for warping. In T. Pajdla & J. Matas (Eds.), European Conference on Computer Vision—ECCV 2004 (pp. 25–36). Part IV, volume 3024 of LNCS Berlin: Springer.
Brox, T., & Malik, J. (2011). Large displacement optical flow: Descriptor matching in variational motion estimation. IEEE Transactions on Pattern Aanalysis and Machine Intelligencea, 33(3), 500–513. [DOI] [PubMed]
Chambolle A. An algorithm for total variation minimization and applications. Journal of Mathematical Imaging and Vision. 2004;20:89–97. doi: 10.1023/B:JMIV.0000011320.81911.38. [DOI] [Google Scholar]
Chambolle, A., & Pock, T. (2011). A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision, 40(1), 120–145.
Deriche, R., Kornprobst, P., & Aubert, G. (1995). Optical-flow estimation while preserving its discontinuities: A variational approach. In Proceedings of the Second Asian Conference on Computer Vision (Vol. 2, pp. 290–295). Singapore, Dec. 1995.
Esser E, Zhang X, Chan TF. A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science. SIAM Journal on Imaging Sciences. 2010;3(4):1015–1046. doi: 10.1137/09076934X. [DOI] [Google Scholar]
Garg, R., Pizarro, L., Rueckert, D., & Agapito, L. (2010). Dense multi-frame optic flow for non-rigid objects using subspace constraints. In Asian Conference on Computer Vision, pp. 460–473.
Garg, R., Roussos, A., & Agapito, L. (2011). Robust trajectory-space TV-L1 optical flow for non-rigid sequences. In 8th International Conference on Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR), pp. 300–314.
Horn B, Schunck B. Determining optical flow. Artificial Intelligence. 1981;17:185–203. doi: 10.1016/0004-3702(81)90024-2. [DOI] [Google Scholar]
Irani, M. (2002). Multi-frame correspondence estimation using subspace constraints. International Journal of Computer Vision, 48(3), 173–194.
Kumar, A., Tannenbaum, A. R., & Balas, G. J. (Apr. 1996). Optic flow: A curve evolution approach. IEEE Transactions on Image Processing, 5(4), 598–610. [DOI] [PubMed]
Liu, C., Yuen, J., & Torralba, A. (2011). SIFT flow: Dense correspondence across scenes and its applications. In IEEE Transational on Pattern Analysis and Machins Intelligence, 33(5), 978–994. [DOI] [PubMed]
Lucas, B., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In International Joint Conference on Artificial Intelligence.
Newcombe, R., Lovegrove, S., & Davison, A. (2011). DTAM: Dense tracking and mapping in real-time. In International Conference on Computer Vision, pp. 2320–2327.
Nir, T., Bruckstein, A. M., & Kimmel, R. (February 2008). Over-parameterized variational optical flow. International Journal of Computer Vision, 76, 205–216.
Papadakis, N., Corpetti, T., & Mémin, E. (2007). Dynamically consistent optical flow estimation. In ICCV (pp. 1–7). Rio de Janeiro, Brazil, October 2007.
Papenberg, N., Bruhn, A., Brox, T., Didas, S., & Weickert, J. (Apr. 2006). Highly accurate optic flow computation with theoretically justified warping. International Journal of Computer Vision, 67(2), 141–158.
Pizarro, D., & Bartoli, A. (2010). Feature-based deformable surface detection with self-occlusion reasoning. In International symposium on 3D data processing, visualization and transmission, 3DPVT’10.
Pock, T., & Chambolle, A. (2011). Diagonal preconditioning for first order primal-dual algorithms in convex optimization. In International Conference on Computer Vision, pp. 1762–1769.
Pock, T., Cremers, D., Bischof, H., & Chambolle, A. (2010). Global solutions of variational models with convex regularization. SIAM Journal on Imaging Sciences, 3(4), 1122–1145.
Rakêt, L. L., Roholm, L., Nielsen, M., & Lauze, F. (2011). TV-L1 optical flow for vector valued images. In 8th International Conference on Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR), pp. 329–343.
Ricco, S., & Tomasi, C. (2012). Dense lagrangian motion estimation with occlusions. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1800–1807.
Rockafellar, R. T. (1997). Convex analysis. Princeton Landmarks in Mathematics. Princeton University Press, Princeton, NJ, 1997. Reprint of the 1970 original, Princeton Paperbacks.
Rudin L, Osher S, Fatemi E. Nonlinear total variation based noise removal algorithms. Physica D. 1992;60:259–268. doi: 10.1016/0167-2789(92)90242-F. [DOI] [Google Scholar]
Sapiro, G. (1997). Color snakes. Computer Vision and Image Understanding, 68(2), 247–253.
Schnörr, C. (1994). Segmentation of visual motion by minimizing convex non-quadratic functionals. In Proceedings of the twelfth international conference on pattern recognition (Vol. A, pp. 661–663). Jerusalem, Israel, Oct. 1994. IEEE Computer Society Press.
Shi, J., & Tomasi, C. (1994). Good features to track. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 593–600.
Steinbruecker, F., Pock, T., & Cremers, D. (2009). Large displacement optical flow computation without warping. In International Conference on Computer Vision, pp. 1609–1614.
Stuehmer, J., Gumhold, S., & Cremers, D. (2010). Real-time dense geometry from a handheld camera. In Pattern recognition (Proc. DAGM) (pp. 11–20), September 2010.
Sun, D., Roth, S., Lewis, J. P., & Black, M. (2008). Learning optical flow. In European Conference on Computer Vision, pp. 83–97.
Tian, Y., & Narasimhan, S. (2010). A globally optimal data-driven approach for image distortion estimation. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1277–1284.
Torresani, L., & Bregler, C. (2002). Space-time tracking. In European Conference on Computer Vision, pp. 801–812.
Torresani, L., Hertzmann, A., & Bregler, C. (2008). Non-rigid structure-from-motion: Estimating shape and motion with hierarchical priors. PAMI, 30(5), 878–892. [DOI] [PubMed]
Torresani, L., Yang, D., Alexander, E., & Bregler, C. (2001). Tracking and modeling non-rigid objects with rank constraints. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 493–500.
Tschumperlé D, Deriche R. Vector-valued image regularization with PDE’s: A common framework for different applications. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2005;27(4):506–517. doi: 10.1109/TPAMI.2005.87. [DOI] [PubMed] [Google Scholar]
Uras S, Girosi F, Verri A, Torre V. A computational approach to motion perception. Biological Cybernetics. 1988;60:79–87. doi: 10.1007/BF00202895. [DOI] [Google Scholar]
Varol, A., Salzmann, M., Tola, E., & Fua, P. (2009). Template-free monocular reconstruction of deformable surfaces. In International Conference on Computer Vision, pp. 1811–1818.
Volz, S., Bruhn, A., Valgaerts, L., & Zimmer, H. (2011). Modeling temporal coherence for optical flow. In International Conference on Computer Vision, pp. 1116–1123.
Wedel, A., Cremers, D., Pock, T., & Bischof, H. (2009). Structure- and motion-adaptive regularization for high accuracy optic flow. In International Conference on Computer Vision, pp. 1663–1668.
Wedel, A., Pock, T., Braun, J., Franke, U., & Cremers, D. (2008). Duality TV-L1 flow with fundamental matrix prior. In Image and Vision Computing New Zealand, pp. 1–6.
Wedel, A., Pock, T., Zach, C., Bischof, H., & Cremers, D. (2009). An improved algorithm for TV-L1 optical flow. In Statistical and geometrical approaches to visual motion analysis, LNCS (pp. 23–45). Springer, Berlin.
Weickert, J. (1998). On discontinuity-preserving optic flow. In S. Orphanoudakis, P. Trahanias, J. Crowley, & N. Katevas (Eds.), Proceedings of the computer vision and mobile robotics workshop (pp. 115–122). Santorini, Greece, Sept 1998.
Weickert, J., & Schnörr, C. (Dec. 2001). A theoretical framework for convex regularizers in PDE-based computation of image motion. International Journal of Computer Vision, 45(3), 245–264.
Weickert, J., & Schnörr, C. (May 2001). Variational optic flow computation with a spatio-temporal smoothness constraint. Journal of Mathematical Imaging and Vision, 14(3), 245–255.
Werlberger, M., Trobin, W., Pock, T., Wedel, A., Cremers, D., & Bischof, H. (2009). Anisotropic Huber-L1 optical flow. In British Machine Vision Conference, Vol. 34, pp. 1–11.
White, R., Crane, K., Forsyth, D. (2007). Capturing and animating occluded cloth. In ACM Transactions on Graphics.
Zach, C., Pock, T., & Bischof, H. (2007). A duality based approach for realtime TV-L1 optical flow. In Pattern recognition (Proc. DAGM), pp. 214–223.

[CR1] Akhter, I., Sheikh, Y., Khan, S., & Kanade, T. (2008). Nonrigid structure from motion in trajectory space. In Neural Information Processing Systems, pp. 41–48.

[CR2] Akhter I, Sheikh Y, Khan S, Kanade T. Trajectory space: A dual representation for nonrigid structure from motion. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2011;33(7):1442–1456. doi: 10.1109/TPAMI.2010.201. [DOI] [PubMed] [Google Scholar]

[CR3] Alvarez L., Esclarín J., Lefébure M., Sánchez J. (1999). A PDE model for computing the optical flow. In Proceedings of the XVI Congreso de Ecuaciones Diferenciales y Aplicaciones (pp. 1349–1356). Las Palmas de Gran Canaria, Spain, Sept. 1999.

[CR4] Alvarez, L., Weickert, J., & Sánchez, J. (Aug. 2000). Reliable estimation of dense optical flow fields with large displacements. International Journal of Computer Vision, 39(1), 41–56.

[CR5] Aubert G, Deriche R, Kornprobst P. Computing optical flow via variational techniques. SIAM Journal on Applied Mathematics. 1999;60(1):156–182. doi: 10.1137/S0036139998340170. [DOI] [Google Scholar]

[CR6] Baker S, Scharstein D, Lewis J, Roth S, Black M, Szeliski R. A database and evaluation methodology for optical flow. International Journal of Computer Vision. 2011;92:1–31. doi: 10.1007/s11263-010-0390-2. [DOI] [Google Scholar]

[CR7] Bartoli, A., Gay-Bellile, V., Castellani, U., Peyras, J., Olsen, S. I., & Sayd, P. (2008). Coarse-to-fine low-rank structure-from-motion. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8.

[CR8] Blomgren, P., & Chan, T. (1998). Color TV: Total variation methods for restoration of vector-valued images. IEEE Transactioons on Image Processing, 7(3):304–309, Special issue on partial differential equations and geometry-driven diffusion in image processing and analysis. [DOI] [PubMed]

[CR9] Brand, M. (2001). Morphable models from video. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 456–463.

[CR10] Bregler, C., Hertzmann, A., & Biermann, H. (2000). Recovering non-rigid 3D shape from image streams. In IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. 690–696.

[CR11] Brox, T., Bruhn, A., Papenberg, N., & Weickert, J. (2004). High accuracy optical flow estimation based on a theory for warping. In T. Pajdla & J. Matas (Eds.), European Conference on Computer Vision—ECCV 2004 (pp. 25–36). Part IV, volume 3024 of LNCS Berlin: Springer.

[CR12] Brox, T., & Malik, J. (2011). Large displacement optical flow: Descriptor matching in variational motion estimation. IEEE Transactions on Pattern Aanalysis and Machine Intelligencea, 33(3), 500–513. [DOI] [PubMed]

[CR13] Chambolle A. An algorithm for total variation minimization and applications. Journal of Mathematical Imaging and Vision. 2004;20:89–97. doi: 10.1023/B:JMIV.0000011320.81911.38. [DOI] [Google Scholar]

[CR14] Chambolle, A., & Pock, T. (2011). A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision, 40(1), 120–145.

[CR15] Deriche, R., Kornprobst, P., & Aubert, G. (1995). Optical-flow estimation while preserving its discontinuities: A variational approach. In Proceedings of the Second Asian Conference on Computer Vision (Vol. 2, pp. 290–295). Singapore, Dec. 1995.

[CR16] Esser E, Zhang X, Chan TF. A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science. SIAM Journal on Imaging Sciences. 2010;3(4):1015–1046. doi: 10.1137/09076934X. [DOI] [Google Scholar]

[CR17] Garg, R., Pizarro, L., Rueckert, D., & Agapito, L. (2010). Dense multi-frame optic flow for non-rigid objects using subspace constraints. In Asian Conference on Computer Vision, pp. 460–473.

[CR18] Garg, R., Roussos, A., & Agapito, L. (2011). Robust trajectory-space TV-L1 optical flow for non-rigid sequences. In 8th International Conference on Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR), pp. 300–314.

[CR19] Horn B, Schunck B. Determining optical flow. Artificial Intelligence. 1981;17:185–203. doi: 10.1016/0004-3702(81)90024-2. [DOI] [Google Scholar]

[CR20] Irani, M. (2002). Multi-frame correspondence estimation using subspace constraints. International Journal of Computer Vision, 48(3), 173–194.

[CR21] Kumar, A., Tannenbaum, A. R., & Balas, G. J. (Apr. 1996). Optic flow: A curve evolution approach. IEEE Transactions on Image Processing, 5(4), 598–610. [DOI] [PubMed]

[CR22] Liu, C., Yuen, J., & Torralba, A. (2011). SIFT flow: Dense correspondence across scenes and its applications. In IEEE Transational on Pattern Analysis and Machins Intelligence, 33(5), 978–994. [DOI] [PubMed]

[CR23] Lucas, B., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In International Joint Conference on Artificial Intelligence.

[CR24] Newcombe, R., Lovegrove, S., & Davison, A. (2011). DTAM: Dense tracking and mapping in real-time. In International Conference on Computer Vision, pp. 2320–2327.

[CR25] Nir, T., Bruckstein, A. M., & Kimmel, R. (February 2008). Over-parameterized variational optical flow. International Journal of Computer Vision, 76, 205–216.

[CR26] Papadakis, N., Corpetti, T., & Mémin, E. (2007). Dynamically consistent optical flow estimation. In ICCV (pp. 1–7). Rio de Janeiro, Brazil, October 2007.

[CR27] Papenberg, N., Bruhn, A., Brox, T., Didas, S., & Weickert, J. (Apr. 2006). Highly accurate optic flow computation with theoretically justified warping. International Journal of Computer Vision, 67(2), 141–158.

[CR28] Pizarro, D., & Bartoli, A. (2010). Feature-based deformable surface detection with self-occlusion reasoning. In International symposium on 3D data processing, visualization and transmission, 3DPVT’10.

[CR29] Pock, T., & Chambolle, A. (2011). Diagonal preconditioning for first order primal-dual algorithms in convex optimization. In International Conference on Computer Vision, pp. 1762–1769.

[CR30] Pock, T., Cremers, D., Bischof, H., & Chambolle, A. (2010). Global solutions of variational models with convex regularization. SIAM Journal on Imaging Sciences, 3(4), 1122–1145.

[CR31] Rakêt, L. L., Roholm, L., Nielsen, M., & Lauze, F. (2011). TV-L1 optical flow for vector valued images. In 8th International Conference on Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR), pp. 329–343.

[CR32] Ricco, S., & Tomasi, C. (2012). Dense lagrangian motion estimation with occlusions. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1800–1807.

[CR33] Rockafellar, R. T. (1997). Convex analysis. Princeton Landmarks in Mathematics. Princeton University Press, Princeton, NJ, 1997. Reprint of the 1970 original, Princeton Paperbacks.

[CR34] Rudin L, Osher S, Fatemi E. Nonlinear total variation based noise removal algorithms. Physica D. 1992;60:259–268. doi: 10.1016/0167-2789(92)90242-F. [DOI] [Google Scholar]

[CR35] Sapiro, G. (1997). Color snakes. Computer Vision and Image Understanding, 68(2), 247–253.

[CR36] Schnörr, C. (1994). Segmentation of visual motion by minimizing convex non-quadratic functionals. In Proceedings of the twelfth international conference on pattern recognition (Vol. A, pp. 661–663). Jerusalem, Israel, Oct. 1994. IEEE Computer Society Press.

[CR37] Shi, J., & Tomasi, C. (1994). Good features to track. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 593–600.

[CR38] Steinbruecker, F., Pock, T., & Cremers, D. (2009). Large displacement optical flow computation without warping. In International Conference on Computer Vision, pp. 1609–1614.

[CR39] Stuehmer, J., Gumhold, S., & Cremers, D. (2010). Real-time dense geometry from a handheld camera. In Pattern recognition (Proc. DAGM) (pp. 11–20), September 2010.

[CR40] Sun, D., Roth, S., Lewis, J. P., & Black, M. (2008). Learning optical flow. In European Conference on Computer Vision, pp. 83–97.

[CR41] Tian, Y., & Narasimhan, S. (2010). A globally optimal data-driven approach for image distortion estimation. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1277–1284.

[CR42] Torresani, L., & Bregler, C. (2002). Space-time tracking. In European Conference on Computer Vision, pp. 801–812.

[CR43] Torresani, L., Hertzmann, A., & Bregler, C. (2008). Non-rigid structure-from-motion: Estimating shape and motion with hierarchical priors. PAMI, 30(5), 878–892. [DOI] [PubMed]

[CR44] Torresani, L., Yang, D., Alexander, E., & Bregler, C. (2001). Tracking and modeling non-rigid objects with rank constraints. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 493–500.

[CR45] Tschumperlé D, Deriche R. Vector-valued image regularization with PDE’s: A common framework for different applications. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2005;27(4):506–517. doi: 10.1109/TPAMI.2005.87. [DOI] [PubMed] [Google Scholar]

[CR46] Uras S, Girosi F, Verri A, Torre V. A computational approach to motion perception. Biological Cybernetics. 1988;60:79–87. doi: 10.1007/BF00202895. [DOI] [Google Scholar]

[CR47] Varol, A., Salzmann, M., Tola, E., & Fua, P. (2009). Template-free monocular reconstruction of deformable surfaces. In International Conference on Computer Vision, pp. 1811–1818.

[CR48] Volz, S., Bruhn, A., Valgaerts, L., & Zimmer, H. (2011). Modeling temporal coherence for optical flow. In International Conference on Computer Vision, pp. 1116–1123.

[CR49] Wedel, A., Cremers, D., Pock, T., & Bischof, H. (2009). Structure- and motion-adaptive regularization for high accuracy optic flow. In International Conference on Computer Vision, pp. 1663–1668.

[CR50] Wedel, A., Pock, T., Braun, J., Franke, U., & Cremers, D. (2008). Duality TV-L1 flow with fundamental matrix prior. In Image and Vision Computing New Zealand, pp. 1–6.

[CR51] Wedel, A., Pock, T., Zach, C., Bischof, H., & Cremers, D. (2009). An improved algorithm for TV-L1 optical flow. In Statistical and geometrical approaches to visual motion analysis, LNCS (pp. 23–45). Springer, Berlin.

[CR52] Weickert, J. (1998). On discontinuity-preserving optic flow. In S. Orphanoudakis, P. Trahanias, J. Crowley, & N. Katevas (Eds.), Proceedings of the computer vision and mobile robotics workshop (pp. 115–122). Santorini, Greece, Sept 1998.

[CR53] Weickert, J., & Schnörr, C. (Dec. 2001). A theoretical framework for convex regularizers in PDE-based computation of image motion. International Journal of Computer Vision, 45(3), 245–264.

[CR54] Weickert, J., & Schnörr, C. (May 2001). Variational optic flow computation with a spatio-temporal smoothness constraint. Journal of Mathematical Imaging and Vision, 14(3), 245–255.

[CR55] Werlberger, M., Trobin, W., Pock, T., Wedel, A., Cremers, D., & Bischof, H. (2009). Anisotropic Huber-L1 optical flow. In British Machine Vision Conference, Vol. 34, pp. 1–11.

[CR56] White, R., Crane, K., Forsyth, D. (2007). Capturing and animating occluded cloth. In ACM Transactions on Graphics.

[CR57] Zach, C., Pock, T., & Bischof, H. (2007). A duality based approach for realtime TV-L1 optical flow. In Pattern recognition (Proc. DAGM), pp. 214–223.

PERMALINK

A Variational Approach to Video Registration with Subspace Constraints

Ravi Garg

Anastasios Roussos

Lourdes Agapito

Abstract

Introduction

Fig. 1.

Fig. 2.

Related Work and Contribution

Our contribution

Multi-frame Image Registration

Low-Rank Trajectory Space

Dense Trajectory Subspace Constraints

Fig. 3.

Non-Rigid Video Registration from Multi-frame Optical Flow

Variational Multi-frame Optical Flow Estimation

Description of the Energy

Connections to Previous Work

Effective Trajectory Regularization

Optimization of the Proposed Energy

Minimization Step 1

Minimization Step 2

Derivation of the Trajectory Basis

Generalization to Sequences of Vector-Valued Images

Proposed Dual Formulation

Implementation Details

Reparameterization of the Optical Flow: Hard Subspace Constraint

Experimental Results

Construction of a Ground Truth Benchmark Dataset

Fig. 4.

Quantitative Results on Benchmark Sequence

Fig. 5.

Fig. 6.

Table 1.

Fig. 7.

Experimental Comparison of Soft Versus Hard Subspace Constraint

Table 2.

Experiments on Real Sequences

Actor sequence

Fig. 8.

Fig. 9.

Fig. 10.

Actress sequence

Fig. 11.

Paper bending-1 sequence

Fig. 12.

Fig. 13.

Paper bending-2 sequence

Fig. 14.

Conclusions

Acknowledgments

Open Access

Appendix A: Primal Dual Algorithm for Denoising

Appendix B: Primal Dual Algorithm for Robust Vector-Valued Image Matching

Euclidean Norm Penaliser

Huber Penaliser

Appendix C: Optimization of the Hard Subspace Constraint

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases