Abstract
Parameterized Appearance Models (PAMs) (e.g. Eigentracking, Active Appearance Models, Morphable Models) are commonly used to model the appearance and shape variation of objects in images. While PAMs have numerous advantages relative to alternate approaches, they have at least two drawbacks. First, they are especially prone to local minima in the fitting process. Second, often few if any of the local minima of the cost function correspond to acceptable solutions. To solve these problems, this paper proposes a method to learn a cost function by explicitly optimizing that the local minima occur at and only at the places corresponding to the correct fitting parameters. To the best of our knowledge, this is the first paper to address the problem of learning a cost function to explicitly model local properties of the error surface to fit PAMs. Synthetic and real examples show improvement in alignment performance in comparison with traditional approaches.
1. Introduction
Since the early work of Sirovich and Kirby [21] parameterizing the human face using Principal Component Analysis (PCA) and the successful eigenfaces of Turk and Pentland [23], many computer vision researchers have used PCA techniques to construct linear models of optical flow, shape or graylevel [3, 10, 4, 6, 19, 14, 5, 11]. In particular, Parameterized Appearance Models (PAMs) (e.g. eigentracking [4], active appearance models [6, 10, 17, 9, 11], morphable models [5, 14]) have proven to be an appropriate statistical tool for modeling shape and appearance variation of objects in images. In PAMs, the appearance/shape models of objects are built by performing PCA on training data. Once the models have been constructed, finding the location/configuration of an object of interest in a testing image is achieved by minimizing a cost function w.r.t. some transformation (motion) parameters; this is referred to as the fitting process.
Although widely used, PAMs suffer from two problems in the fitting process. First, they are especially prone to local minima. Second, often few, if any, of the local minima of the cost function correspond to acceptable solutions. Figures 1a,d,f illustrate these problems. Fig. 1d plots the error surface constructed by translating the testing image (Fig. 1c) around the ground truth landmarks (Fig. 1c) and computing the values of the cost function. The cost function is based on a PCA model constructed from labeled training data (Fig. 1a). Fig. 1f shows the contour plot of this error surface. As can be observed, any gradient-based optimization method is likely to get stuck at local minima, and will not converge to the global minimum. Moreover, the global minimum of this cost function is not at the desired position, the black dot of Fig. 1d, which corresponds to the correct landmarks' locations. These problems occur because the PCA model is constructed without considering the neighborhoods of the correct motion parameters (parameters that correspond to ground truth landmarks of training data). The neighborhoods determine the local minima properties of the error surface, and should be taken into account while constructing the models.
Figure 1.

Learning a better model for image alignment. (d,f): surface and contour plot of the PCA model. It has many local minima; (e, g): Local Minima Free PAM (LMF-PAM) method learns a better error surface to fit PAMs. This figure is best seen in color.
In this paper, we propose to learn the cost function (i.e. appearance model) that has a local minimum at the “expected” location and no other local minima in its neighborhood. This is done by enforcing constraints on the gradients of the cost function at the desired location and its neighborhood. Fig. 1e,g plot the error surface and contours of the learned cost function. This cost function has a local minimum in the expected place (black dot of Fig. 1e), and no other local minima near by.
2. Previous work
Over the last decade, appearance models have become increasingly important in computer vision and graphics. In particular, PAMs have been proven useful for alignment, detection, tracking, and face synthesis [5, 4, 10, 6, 17, 19, 14, 11, 24]. This section reviews PAMs and gradient-based methods for the efficient alignment of high dimensional deformation models.
2.1. PAMs
PAMs [4, 10, 6, 19, 14, 5, 24] build the objects' appearance/shape representation from the principal components of training data. Let di ∈ ℜm×1 (see notation 1) be the ith sample of a training set D ∈ ℜm×n and U ∈ ℜm×k the first k principal components [13]. Once the model has been constructed (i.e. U is known), tracking/alignment is achieved by finding the motion parameter p that best aligns the data w.r.t. the subspace U, i.e.
| (1) |
Here x = [x1, y1, …xl, yl]T is the vector containing the coordinates of the pixels to track. f(x, p) is the function for geometric transformation; denote f (x, p) by [u1, v1, …, ul, vl]T. d is the image frame in consideration, and d(f(x, p)) is the appearance vector of which the ith entry is the intensity of image d at pixel (ui, vi). For affine and non-rigid transformations, (ui, vi) relates to (xi, yi) by:
| (2) |
with , where Us is the non-rigid shape model learned by performing PCA on a set of registered shapes [7]. a, cs are affine and non-rigid motion parameters respectively, and p = [a; cs].
2.2. Optimization for PAMs
Given an image d, PAM tracking/alignment algorithms optimize (1). Due to the high dimensionality of the motion space, a standard approach to efficiently search over the parameter space is to use gradient-based methods [1, 7, 17, 4, 8, 25]. To compute the gradient of the cost function given in (1), it is common to use Taylor series expansion to approximate d(f(x, p + δp)) by d(f(x, p)) + Jd(p)δp, where is the Jacobian of the image d w.r.t. to the motion parameter p [16]. Once linearized, a standard approach is to use the Gauss-Newton method for optimization [2, 4]. Other approaches learn an approximation of the Jacobian matrix with linear [7] or non-linear [20, 15] regression.
Over the last few years, several strategies for improving the fitting performance have been proposed. For examples, Black & Anandan [4] and Cootes & Taylor [7] proposed using multi-resolution schemes, Xiao et al [26] proposed using 3D models to constrain 2D solutions, de la Torre et al proposed learning filters to achieve robustness to local minima, de la Torre & Black [8], and Baker & Matthews [1] learned a PCA model invariant to rigid and non-rigid transformations. Although these methods show significant performance improvement, they do not directly address the problem of learning a cost function with no local minima. In this paper, we deliberately learn a cost function which has local minima at and only at the desired places.
3. Learning parameters of the cost functions
Gradient-based algorithms, such as the ones discussed in the previous section, might not converge to the correct location (i.e. correct motion parameters) for several reasons. First, gradient-based methods are susceptible to being stuck at local minima. Second, even when the optimizer converges to a global minimum, the global minimum might not correspond to the correct motion parameters. These two problems occur primarily because PCA has limited generalization capabilities to model appearance variation. This section proposes a method to learn cost functions that do not exhibit these two problems in training data.
3.1. A generic cost function for alignment
This section proposes a generic quadratic error function where many PAMs can be cast. The quadratic error function has the form:
| (3) |
Here A ∈ ℜm×m and b ∈ ℜm×1 are the fixed parameters of the function, and A is symmetric. This function is the general form of many cost functions used in the literature including Active Appearance Models [6], Eigentracking [4], and template tracking [16, 18]. For instance, consider the cost function given in (1). If p is fixed, the optimal c that minimizes (1) can be obtained using c = UTd(f(x, p)). Substituting this back into (1) and performing some basic algebra, (1) is equivalent to: minp d(f(x, p))T(Im − UUT)d(f(x, p)). Thus (1) is a special case of (3), with A = Im − UUT, and b = 0m.
For template tracking, the cost function is typically the sum of squared differences: , where dref is the reference template. This cost function is equivalent to: . Thus the cost function used in template tracking is also a special case of (3) with A = Im and b = −dref.
3.2. Desired properties of cost functions
As discussed previously, it is desirable that the cost function have minima at and only at the ‘right’ places. In this section, we deliberately address this need as an optimization problem over A and b.
Let be a set of training images containing the objects of interest (e.g. faces), and assume the landmarks for the object shapes are available (e.g. manually labeled facial landmarks as in Fig. 5a). Let si be the vector containing the landmark coordinates of image di. Given , we perform Procrustes analysis [7] and build the shape model as follows. First, the mean shape is calculated. Second, we compute ai the affine parameter that best transforms s̄ to si, and let be the inverse affine transformation of ai. Third, ŝi is obtained by applying the inverse affine transformation on si (warping toward the mean shape). Next, we perform PCA on to construct Us, a basis for non-rigid shape variation. We then compute , the coefficients of ŝi − s̄ w.r.t. the the basis Us. Finally, let , pi is the parameter of image di w.r.t. to our shape model. Notably, the shape model and are derived independently of the appearance model. The appearance model (i.e. the cost function E(d, p)) is what needs to be learned.
Figure 5.

(a) example of landmarks associated with each face (red dots), (b) example of shape distortion (yellow pluses), (c) example of patches for appearance modeling.
For E(di, p) to have a local minimum at the right place, pi must be a local minimum of E(di, p). Theoretically, this requires to vanish, i.e.
| (4) |
To learn a cost function that has few local minima, it is necessary to consider pi's neighborhoods. Let
i = {p :lb ≤ ‖p − pi‖2 ≤ ub},
,
. Here lb is chosen such that
is a set of neighbor parameters that are very close to pi; it is satisfactory for a fitting algorithm to converge to a point in
. ub is chosen so that the fitting algorithm is guaranteed to be initialized at a point in
i or
. In most applications, such ub exists. For example, for tracking problems, ub can be set to the maximum movement of the object being tracked between two consecutive frames. Fig. 2 depicts the relationship between
,
i, and
.
Figure 2.

Neighborhoods around the ground truth motion parameter pi (ret dot).
: region inside the orange circle; it is satisfactory for fitting algorithms to converge to this region.
: region outside the blue circle; alignment algorithm will not be initialized in this region.
i: shaded region, region to enforce constraints on gradients.
For a gradient descent algorithm to converge to pi or a point close enough to pi, it is necessary that E(di, .) have no local minima in
i. This implies that
does not vanish for p ∈
i. Notably, it is not necessary to enforce similar constraints for
because of the way lb, ub are chosen. Another desirable property is that each iteration of gradient descent advances closer to the correct position. Because gradient descent walks against the gradient direction at every iteration, we would like the opposite direction of the gradient at point p ∈
i to be similar to the optimal walking direction pi − p. This quantity can be measured as the projection of the walking direction onto the optimal direction. Fig. 3 illustrates the rationale of this requirement. This requirement leads to the constraints:
Figure 3.

pi: desired convergence location. Blue arrows: gradient vectors, red arrows: walking directions of gradient descent algorithm, orange arrows: optimal directions to the desired location. Performing gradient descent at p advances closer to pi while performing gradient descent at p′ moves away from pi.
| (5) |
Equations (4) and (5) specify the constraints for the ideal cost function. However, these constraints might be too stringent. Therefore, we propose to relax the constraints to get the optimization problem:
| (6) |
Here is required to be small instead of strictly zero. ξi's are slack variables for constraints in (5) which allows for penalized constraint violation. C is the parameter controlling the trade-off between having few local minima and having local minima at the right places.
The gradient of the function E(d, p) plays a fundamental role in the above optimization problem. To compute the gradient , it is common to use first order Taylor series expansion to approximate d(f(x, p + δp)) by d(f(x, p)) + Jd(p) δp, where is the spatial intensity gradient of the image d w.r.t. to the motion parameter p [16]. This yields:
| (7) |
Substituting (7) into (6), we obtain a quadratic optimization problem with linear constraints over A and b.
3.3. Practical issues and alternative fitting methods
In practice, there is an issue regarding the optimization of (6): the small components of tend to be neglected when optimizing (6). This occurs due to the magnitude difference between some columns of Jd(p). For example, in (2), the magnitudes of the Jacobians of d(f (x, p)) w.r.t. to a1, a2, a4, a5 can be much larger than the magnitudes of the Jacobians of d(f (x, p)) w.r.t. to a3, a6.
To address this concern, we consider an alternative optimization strategy where the update rule at iteration kth is:
| (8) |
The update rule of the above algorithm is a variant of Newton iteration. Intuitively, Hd(pk) is similar to the Hessian of E(d, p) at pk, and it acts as a normalization matrix for the gradient. This algorithm is indeed a reasonable optimization scheme for cost functions in which A is symmetric positive semidefinite with all eigenvalues less than or equal to 1. See Theorem 1 in the Appendix for the proof.
Similar to the case of gradient descent, requiring the incremental updates to vanish at only at the places corresponding to acceptable solutions yields the following optimization problem:
| (9) |
A is also constrained to be a symmetric positive semidefinite matrix where eigenvalues are less than or equal one. By incorporating the ideas of maximal margin and regularization, we obtain:
| (10) |
where
m denotes the set of all m × m symmetric matrices of which all eigenvalues are non-negative and less than or equal to one. Ω (A, b) is the regularization term for A and b, C2 is the weight for the regularization term, and C3 is the user-defined margin size. Since Δdi(pi) is linear in terms of A and b, this is a quadratic programming problem with linear constraints, provided the requirement A ∈
m can be described by linear constraints.
Of course, one can derive a similar learning problem for A and b where the Newton method is the optimizer of choice. The incremental update in Newton iteration is:
| (11) |
However, each Newton iteration has to invert Jd(pk)T AJd(pk). As a result, learning A and b becomes much harder because the optimization problem is no longer quadratic with linear constraints.
4. Special cases and experiments
Sec. 3.3 proposes a method for learning generic A and b. However, in specific situations, A and b can be further parameterized. The benefits of further parameterization are threefold. First, the number of parameters to learn can be reduced. Second, the relationship between A and b can be established. Third, the constraint that A ∈
m can be replaced by a set of linear constraints. This section provides the formulation for two special cases, namely weighted template alignment and weighted-basis AAM alignment. Experimental results on synthetic and real data are included.
4.1. Weighted template alignment
As shown in Sec. 3.1, template alignment is a special case of (3) in which A = Im, and b = −dref. In template alignment, pixels of the template are weighed equally; however, there is no reason why this is optimal. Here, we propose learning the weights of template pixels to avoid local minima in template matching.
Consider the weighted sum of squared differences: (d(f(x, p)) − dref)Tdiag(w)(d(f(x, p)) − dref), where, w is the weight vector for the template's pixels. This cost function is equivalent to (3) with A = diag(w) and b = −diag(w)dref. The constraint A ∈
m can be imposed by requiring 0 ≤ wi ≤ 1. Furthermore, in this setting,
. Thus (10) becomes a linear programming problem with linear constraints over w.
To demonstrate this idea, we create a synthetic template of an isotropic Gaussian (Fig. 4a). Suppose the task is to locate the template inside an image containing the template (Fig. 4c), starting at an arbitrary location. Fig. 4d plots the error surface of the naive cost function (sum of squared differences). The value of this error surface at a particular pixel (x, y) is calculated by computing the sum of squared differences between the template and the circular patch centered at (x, y). Similarly, the error surface of the learned cost function (weighted sum of squared differences) is calculated and displayed in Fig. 4e. The learned template weights are shown in Fig. 4b; brighter pixels mean higher weights. As can be seen, the naive cost function has a fence of local maxima surrounding the template location. This prevents alignment algorithms from converging to the desired location. The learned cost function is convex, and therefore, is more suitable for this particular template.
Figure 4.

Learning to weight template's pixels. (a) synthetic template of an isotropic Gaussian. (b) the learned weights, brighter pixels mean higher weights. (c) an image containing the template. (d) error surface of the sum of squared differences. (e) error surface of the weighted sum of squared differences with the learned weights given in (b).
The template's weights given in Fig. 4b are learned by optimizing (10) with the following parameter settings: Ω(A, b) = 0, C2 = 0, C3 = 10−2, C = 1. The linear constraints are reduced to a set of 5000 constraints obtained by random sampling. How to deal with infinitely many constraints is discussed in more detail in Sec. 4.2.
4.2. Weighted-basis for AAM alignment
As shown in Sec. 3.1, AAM alignment is a special case of (3) in which
, and b = 0. U is the set of k first eigenvectors from the total of K PCA basis of the training data subspace. k (≤ K) is usually chosen experimentally. In this section, we propose to use all K eigenvectors, but weigh them differently. Specifically, we learn A which has the form:
. To ensure that A ∈
m, we require 0 ≤ λi ≤ 1. Let w = [λT bT]T. Substituting this into (10) we get a quadratic programming problem with linear constraints on w.
To demonstrate this idea, we perform experiments on the Multi-PIE database [12]. This database consists of facial images of 337 subjects taken under different illuminations, expressions and poses. We only make use of the directly-illuminated frontal face images under five expressions (smile, disgust, squint, surprise and scream). Our dataset contains 1100 images, 400 are selected for training, 200 are used for validation (parameter tuning), and the rest are reserved for testing. Each face is manually labeled with 68 landmarks, as shown in Fig. 5a. Images are down sampled to 120 × 160 pixels.
The shape model is built as described in Sec. 3.2. The final shape model requires 10 coefficients (6 affine + 4 non-rigid) to describe a shape. For object appearance, we extract intensity values of pixels inside the patches located at the landmarks (Fig. 5c).
The training data is further divided into two subsets, one contains 300 images and the other contains 100 images. U is obtained by performing PCA on the subset of 300 images. The second subset is used to set up the optimization problem (10). For better generalization, (10) is constructed without using images in the first training subset. To avoid
i being of infinite size, we restrict our attention to a set of 200 random samples from
i. The random samples are drawn by introducing random Gaussian perturbation to the correct shape parameter pi.
Following the approach by Tsochantarisdis et al [22] for minimizing a quadratic function with an exponentially large number of linear constraints, we maintain a smaller subset of active constraints S and optimize (10) iteratively. We repeat the following steps for 10 iterations: (i) empty S; (ii) randomly choose 20 training images; (iii) for each chosen training image di, find the 100 most violated constraints from
i and include them in S; (iv) run quadratic programming with the reduced set of constraints.
Testing data are generated by randomly perturbing the components of pi, the correct shape parameters of test image di. Perturbation amounts are generated from a zero mean Gaussian distribution with standard deviation PerMag × [0.05 0.05 1 0.05 0.05 1 2 2 2 2]T. PerMag controls the overall difficulty of the testing data. The relative perturbation amounts of shape coefficients are determined to simulate possible motion in tracking, and this is estimated visually. Fig. 5b shows an example of shape perturbation, the ground truth landmarks are marked in red (circles), while the perturbed shape is shown in yellow (pluses).
Table 1 describes the experimental results with four difficulty levels of testing data (controlled by PerMag). The performance of the learned cost function is compared with four other cost functions constructed using PCA with popular energy settings (70%, 80%, 90%, and 100%). As can be observed, when the amount of perturbation is small, PCA models with higher energy levels perform better. However, as the amount of pertubation increases, PCA models with lower energy levels perform better. This suggests that cost functions using fewer basis vectors have less local minima while cost functions using more basis vectors are more likely to have local minima at the ‘right’ places. Thus it is unclear what the energy for the PCA model should be. On the other hand, the learned cost function performs significantly better than the PCA models for most difficulty levels. In this experiment, we use , C = 2, C2 = 0.1, and C3 = 0.01. The parameters are tuned using the validation set.
Table 1.
Alignment results of different methods for four different difficulty levels of testing data (PerMag). Initial is the initial amount of perturbation before running any alignment algorithm. PCA e% is the cost function constructed using PCA preserving e% of energy. The table shows the means and standard deviations of mis-alignment (average over 68 landmarks and over testing data). The unit for measurement is pixel.
| PerMag | 0.75 | 1.00 | 1.25 | 1.5 |
|---|---|---|---|---|
| Initial | 0.75±.25 | 1.08±.38 | 1.37±.52 | 1.54±.54 |
| PCA 100% | 0.37±.18 | 0.41±.25 | 0.55±.45 | 0.60±.51 |
| PCA 90% | 0.36±.20 | 0.43±.33 | 0.47±.36 | 0.60±.65 |
| PCA 80% | 0.40±.23 | 0.43±.34 | 0.49±.37 | 0.57±.50 |
| PCA 70% | 0.41±.20 | 0.43±.25 | 0.47±.30 | 0.55±.46 |
| Ours | 0.37±.19 | 0.40±.25 | 0.43±.29 | 0.48±.39 |
5. Conclusion
In this paper, we have proposed a method for learning the cost functions for PAMs. We directly address the problem of learning cost functions that have local minima at and only at the desired places. The task of learning a cost function is formulated as optimizing a quadratic function under some linear constraints. To the best of our knowledge, this is the first paper that addresses this problem. Encouraging results have been achieved in the context of template matching and AAM fitting. Further work needs to address how to select the most interesting points in the error surface to reduce the number of constraints in the optimization.
Acknowledgments
This material is based upon work supported by the U.S. Naval Research Laboratory under Contract No. N00173-07-C-2040 and National Institute of Health Grant R01 MH 051435. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the U.S. Naval Research Laboratory.
Appendix
This section states and proves a theorem used to justify the optimization algorithm given in (8).
Theorem 1
Consider an m-dimensional function f(x) of p-dimensional variable x, and suppose we have to minimize the function: E(x) = f(x)T A f(x) + 2bT f(x), where A ∈
m. Consider an iterative optimization method which has the following update rule:
| (12) |
The above optimization method, when started sufficiently close to a regular local minimum, will converge to that local minimum. Here, a point x0 is said to be regular if H is not singular and the Taylor series of f(·) converges for every point in the neighborhood of x0.
Proving Theorem 1 requires two lemmas. We now state and prove those two lemmas.
Lemma 1
A ∈
m if and only if Im − A ∈
m.
Proof
This lemma can be proven easily, based on:
| (13) |
Lemma 2
A ∈
m if and only if there exists a positive integer k, scalars αi's, and matrices Bi's such that:
is invertible .
, and
Proof for sufficiency conditions
Suppose there exist k, αi's, and Bi's that satisfy all all three conditions above. Because A is a linear combination of symmetric matrices, A is also symmetric. We only need to prove that A is positive semidefinite of which all eigenvalues are less than or equal to 1. Consider vT Av for an arbitrarily vector v ∈ ℜm:
| (14) |
We know that is a projection matrix and is the projection of v in the subspace Bi. Thus we have . Therefore:
| (15) |
Furthermore, we have vT Av ≥ 0 because
, and αi ≥ 0 ∀i. Combining this with the inequality in (15), we have: 0 ≤ vT Av ≤ vT v Since these inequalities hold for arbitrary vector v ∈ ℜm, A must be an element of
m.
Proof for necessary conditions
Suppose A ∈
m. Consider the singular value decomposition of A, A = UΛUT. Here, the columns of U are orthonormal vectors. Λ is a diagonal matrix, Λ = diag([λ1, …, λm]) with 0 ≤ λi ≤ 1∀i. Without loss of generality, suppose λ1 ≥ λ2 ≥ … ≥ λm. We have:
| (16) |
Let αi = λi − λi+1 for i =1, …, m − 1, and αm = λm. Let Bi = [u1…ui] for . Since is a set of orthonormal vectors, an identity matrix. Therefore, . Hence:
| (17) |
Finally, we have αi ≥ 0 ∀i and . This completes our proof for Lemma 1 D.
Proof of Theorem 1
From Lemmas 1 and 2 we know that ∃αi ≥ 0, and . To prove Theorem 1, let us first consider the optimization of the following function:
| (18) |
with . One way to optimize this function is using coordinate descent, alternating between:
minimizing E2 w.r.t. x while fixing {ci}.
minimizing E2 w.r.t. {ci} while fixing x.
To minimize E2 w.r.t. x while fixing {ci}, we can use the Newton method:
Using the first order Taylor approximation, we have f(x + δx) ≈ f(x) + Jδx with
| (19) |
| (20) |
| (21) |
Therefore, we have the Newton update rule:
| (22) |
When x is fixed, that globally minimize E2 are:
| (23) |
Combining (22) and (23), we have the update rule for minimizing E2: xnew = xold − (JT J)−1JT[Af(x) + b] This update rule is exactly the same as the update rule given in (12). As a result, (12) will always lead us to a local minimum of E2.
We now prove that a local minimum of E2 obtained by (12) will be a local minimum of E. Suppose is a local minimum of E2, we have ∃ε1 > 0 such that , . Because is a continuous function in terms of x, we can always find ε2 > 0 small enough such that ∀δx if then . Thus ∃ε2 such that . On the other hand, one can easily verify that Therefore, we have ∃ε2 > 0 such that . Hence, x0 must be a local minimum of E.
To sum up, we have shown that (12) will converge to a local minimum of E2. Furthermore, a local minimum of E2 found by (12) is also a local minimum of E. Thus the update rule given in (12) is guaranteed to converge to a local minimum of E. This concludes our proof for Theorem 1 D.
Footnotes
Bold uppercase letters denote matrices (e.g. D), bold lowercase letters denote column vectors (e.g. d). dj represents the jth column of the matrix D. dij denotes the scalar in the row ith and column jth of the matrix D. Non-bold letters represent scalar variables. 1k ∈ ℜk×1 is a column vector of ones. 0k ∈ ℜk×1 is a column vector of zeros. Ik ∈ ℜk×k is the identity matrix. tr(D) = Σidii is the trace of square matrix D. designates Euclidean norm of d. ‖D‖F = tr(DTD) is the Frobenious norm of D. diag(·) is the operator that extracts the diagonal of a square matrix or constructs a diagonal matrix from a vector.
Contributor Information
Minh Hoai Nguyen, Email: minhhoai@cmu.edu.
Fernando De la Torre, Email: ftorre@cs.cmu.edu.
References
- 1.Baker S, Matthews I. Lucas-Kanade 20 years on: a unifying framework. International Journal of Computer Vision. 2004 March;56(3):221–255. [Google Scholar]
- 2.Bergen JR, Anandan P, Hanna KJ, Hingorani R. Hierarchical model-based motion estimation. European Conference on Computer Vision. 1992:237–252. [Google Scholar]
- 3.Black MJ, Fleet DJ, Yacoob Y. Robustly estimating changes in image appearance. Computer Vision and Image Understanding. 2000;78(1):8–31. [Google Scholar]
- 4.Black MJ, Jepson AD. Eigentracking: Robust matching and tracking of objects using view-based representation. International Journal of Computer Vision. 1998;26(1):63–84. [Google Scholar]
- 5.Blanz V, Vetter T. A morphable model for the synthesis of 3D faces. ACM SIGGRAPH. 1999 [Google Scholar]
- 6.Cootes T, Edwards G, Taylor C. Active appearance models. PAMI. 2001;23(6):681–685. [Google Scholar]
- 7.Cootes TF, Taylor C. Statistical models of appearance for computer vision. Technical report, University of Manchester. 2001 [Google Scholar]
- 8.de la Torre F, Black MJ. Robust parameterized component analysis: theory and applications to 2D facial appearance models. Computer Vision and Image Understanding. 2003;91:53–71. [Google Scholar]
- 9.de la Torre F, Collet A, Cohn J, Kanade T. Filtered component analysis to increase robustness to local minima in appearance models. IEEE Conference on Computer Vision and Pattern Recognition. 2007 [Google Scholar]
- 10.de la Torre F, Vitrià J, Radeva P, Melenchón J. Eigen-filtering for flexible eigentracking. International Conference on Pattern Recognition. 2000:1118–1121. [Google Scholar]
- 11.Gong S, Mckenna S, Psarrou A. Dynamic Vision: From Images to Face Recognition. Imperial College Press; 2000. [Google Scholar]
- 12.Gross R, Matthews I, Cohn J, Kanade T, Baker S. Technical report. Robotics Institute, Carnegie Mellon University; 2007. The CMU multi-pose, illumination, and expression (Multi-PIE) face database. TR-07-08. [Google Scholar]
- 13.Jolliffe I. Principal Component Analysis. New York: Springer-Verlag; 1986. [Google Scholar]
- 14.Jones MJ, Poggio T. Multidimensional morphable models. International Conference on Computer Vision. 1998:683–688. [Google Scholar]
- 15.Liu X. Generic face alignment using boosted appearance model. IEEE Conference on Computer Vision and Pattern Recognition. 2007 [Google Scholar]
- 16.Lucas B, Kanade T. An iterative image registration technique with an application to stereo vision. Proceedings of Imaging Understanding Workshop. 1981 [Google Scholar]
- 17.Matthews I, Baker S. Active appearance models revisited. International Journal of Computer Vision. 2004 Nov;60(2):135–164. [Google Scholar]
- 18.Matthews I, Ishikawa T, Baker S. The template update problem. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2004;26:810–815. doi: 10.1109/TPAMI.2004.16. [DOI] [PubMed] [Google Scholar]
- 19.Nayar SK, Poggio T. Early Visual Learning. Oxford University Press; 1996. [Google Scholar]
- 20.Saragih J, Goecke R. A nonlinear discriminative approach to AAM fitting. International Conference on Computer Vision. 2007 [Google Scholar]
- 21.Sirovich L, Kirby M. Low-dimensional procedure for the characterization of human faces. Journal of the Optical Society of America A: Optics, Image Science, and Vision. 1987 March;4(3):519–524. doi: 10.1364/josaa.4.000519. [DOI] [PubMed] [Google Scholar]
- 22.Tsochantaridis I, Joachims T, Hofmann T, Altun Y. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research. 2005;6:1453–1484. [Google Scholar]
- 23.Turk M, Pentland A. Eigenfaces for recognition. Journal Cognitive Neuroscience. 1991;3(1):71–86. doi: 10.1162/jocn.1991.3.1.71. [DOI] [PubMed] [Google Scholar]
- 24.Vetter T. Learning novel views to a single face image. International Conference on Automatic Face and Gesture Recognition. 1997:22–27. [Google Scholar]
- 25.Wimmer M, Stulp F, Tschechne SJ, Radig B. Learning robust objective functions for model fitting in image understanding applications. Proceedings of British Machine Vision Conference. 2006 [Google Scholar]
- 26.Xiao J, Baker S, Matthews I, Kanade T. Real-time combined 2D+3D active appearance models. Conference on Computer Vision and Pattern Recognition. 2004;II:535–542. [Google Scholar]
