Abstract
We recast the Cosegmentation problem using Random Walker (RW) segmentation as the core segmentation algorithm, rather than the traditional MRF approach adopted in the literature so far. Our formulation is similar to previous approaches in the sense that it also permits Cosegmentation constraints (which impose consistency between the extracted objects from ≥ 2 images) using a nonparametric model. However, several previous nonparametric cosegmentation methods have the serious limitation that they require adding one auxiliary node (or variable) for every pair of pixels that are similar (which effectively limits such methods to describing only those objects that have high entropy appearance models). In contrast, our proposed model completely eliminates this restrictive dependence –the resulting improvements are quite significant. Our model further allows an optimization scheme exploiting quasiconvexity for model-based segmentation with no dependence on the scale of the segmented foreground. Finally, we show that the optimization can be expressed in terms of linear algebra operations on sparse matrices which are easily mapped to GPU architecture. We provide a highly specialized CUDA library for Cosegmentation exploiting this special structure, and report experimental results showing these advantages.
1 Introduction
The problem of Cosegmentation [1], has attracted much attention from the community in the last few years [2–5]. The basic goal in Cosegmentation is to segment a common salient foreground object from two or more images, as shown in Fig. .1 Here, consistency between the (extracted) object regions is accomplished by imposing a global constraint which penalizes variations between the objects’ respective histograms or appearance models. The idea has been adopted for obtaining concise measures of image similarity [1], discovering common appearance patterns in image sets [6], medical imaging [7], and building 3D models from community photo collections [2, 8]. Motivated by the spectrum of these applications, some recent papers have sought to better understand the optimization-specific aspects of this problem – in particular, issues such as sub-modularity [1], linear programming relaxations [4], dual-decomposition based strategies [5], network flow constructions [7], and maximum-margin inspired interpretations [9]. Most of these works provide good insights on cosegmentation, but only in the context of the traditional Markov Random Field (MRF) based segmentation objective (referred to as graph-cuts [10]). This may be partly because the first work on Cosegmentation [1] presented means for including global constraints within segmentation but was designed specifically for the MRF function. The present paper complements this body of research, and provides an end-to-end analysis of the Cosegmentation problem in terms of the Random Walker segmentation function [11] – these results show that in many cases, well-known advantages of the Random Walker model extend nicely to the Cosegmentation setting as well.
Figure 1.

A representative application for cosegmentation.
A Toy example
An important aspect of our formulation is that it is possible to employ a nonparametric appearance model for arbitrary distributions but without incurring rather substantial additional computational costs. When there are significant regions of homogeneity in the foreground (inline figure), we clearly want a distribution which has a corresponding peak. In Fig. 1, the distribution should capture the fact that the bear is “brown and furry”, and not try to differentiate one patch of fur from another across multiple images. To illustrate why this point is relevant, let us analyze the overhead of some existing methods for cosegmentation by considering a simple toy example (see inline image pair) where we should identify the common blue circle (in distinct backgrounds). Several approaches for cosegmentation with a nonparametric model require that an auxiliary node (or variable) be introduced into the graph whenever two pixels share the same bin [4, 7] (i.e., the two pixels are similar). While the segmentation aspect for the blue circles by itself is easy, the cost of introducing an auxiliary node for perceptually similar pixels can grow very quickly – counting just the blue foreground pixels for a 196 × 196 image pair, one must introduce 42 million additional variables, and the associated cost is infeasible even for moderately sized images. As a result, these previous models are limited to feasibly cosegmenting only those image pairs that have a relatively high entropy distribution (i.e., each bin is shared by only a few pixels). Our formulation has no such limitation, since auxiliary nodes are not needed to perform the optimization. Consequently, it is possible to perform cosegmentation in general settings where the target foreground is summarized with an arbitrary appearance model (color, texture), but with no associated restriction on its entropy. Further, several nonparametric cosegmentation models (except [9]) are somewhat limited to segmenting foregrounds which are at the same scale within each image. Our method compares histograms independent of scale, and the objective is shown to be quasiconvex. This is leveraged to develop an optimization scheme which can solve this nonconvex problem with optimality guarantees. Finally, all optimization steps in the core method can be expressed as sparse linear algebra operations, which makes GPU implementations possible.

Related Work
The initial model for Cosegmentation in [1] provided means for including global constraints to enforce consistency among the two foreground histograms in addition to the MRF segmentation terms for each image. The objective function incorporating these ideas was expressed as follows
| (1) |
where Eglobal(·, ·) was assumed to penalize the ℓ1 variation between each h1 and h2 (the two foreground histograms obtained post segmentation). The appearance model for the histogram was assumed to be generative (i.e., Gaussian), and a novel scheme called trust region graph cuts was presented for optimizing the resulting form of (1). Subsequently, [4] argued in favor of using an -distance for Eglobal(·, ·) whereas [7] developed a reward based model. A scale-free method is presented in [12], which biases the segmentation towards histograms belonging to a low-rank subspace. Batra et al. [2] suggested exploiting user interaction if available for cosegmentation (again, using MRF terms) and [3, 9] have adopted a clustering based approach to the problem. Recently, [5] compared several existing MRF-based models, and presented a new posterior-maximizing scheme which was solved using dual decomposition. Other recent ideas include modulating the graph-cuts objective a priori by finding similar patterns across images [3], generating multiple segmentations of each image and identifying which segmentation pair is similar [13], and identifying salient regions followed by a filtering step to leave out distinct regions not shared across images [14]. One reason for these varied strategies for the problem is that when a histogram difference term is added to the segmentation, the resultant objective is no longer submodular (therefore, not easy to optimize). So, the focus has been on improved approximate algorithms for different choices of Eglobal.
A commonality among these existing works has been the preference for MRF (i.e., graph-cuts) based terms for segmentation. Part of the reason is that combinatorial methods such as graph-cuts are extensively used in vision, and are known to be efficient. On the other hand, graph-partitioning methods such as Random Walker [11] also work well for image segmentation and are widely used. Our formalization here suggests that it is also well suited for the Cosegmentation problem and offers efficiency benefits (e.g., issues identified in the blue circles example) in the nonparametric setting.
The contributions of this paper are: (1) We derive a cosegmentation model with the Random Walker segmentation at its core. The model finally reduces to a Box-QP problem (convex program with box constraints). Based on this structure, we propose a specialized (and efficient) gradient projection based procedure which finds a global real-valued optimum of the model (which preserves many advantages of Random Walker [11]). (2) Our model allows for a nonparametric representation of the foregrounds (e.g., using distributions over texture words), but one which permits any distribution of features without incurring additional computational costs. This provides a substantial advantage over the existing nonparametric cosegmentation approaches which are limited only to regions described by a high entropy model (i.e., object features must be spread evenly across bins). (3) We extend this model to a scale-independent penalty. This paper presents a novel optimization method for a class of objectives based on quasiconvex functions. We prove correctness, and demonstrate it for model-based image segmentation. These theoretical results are of independent interest. (4) Our optimization consists of linear algebra operations (BLAS Level 1, 2) on sparse matrices. Consequently, the algorithm is easy to implement on a GPU architecture. We give a specialized open-source CUDA library for Cosegmentation.
2 Random Walker and its properties
The Random Walker segmentation algorithm has been studied extensively in the computer vision literature. Essentially, the method simulates a random walk from each pixel in the image to a set of user specified seed points where the walk is biased by image intensity gradients. The eventual assignment of pixels to foreground or background is determined by whether the pixel-specific walk reaches a foreground (or background) seed first. Observe a subtle yet important property of how the Random Walker model is specified, and what the solution actually denotes. Because of direct analogues in circuit theory and physics, the formalization, even in its original form, seeks a solution in reals (not integers). What is eventually solved is therefore not a relaxation because the variables have a clear probabilistic meaning. As a result, thresholding these probabilities at 0.5 is statistically sound; conceptually, this is different from solving a binary linear program in reals and recovering a {0, 1} solution by rounding. In practice, Random Walker is optimized by recasting segmentation as the solution to a combinatorial Dirichlet problem. Random Walker derived segmentations offer some benefits with respect to boundary length regularization, number of seeds, metrication errors, and shrinking bias [15].
3 Random Walker for Cosegmentation
We begin our presentation by rewriting the Random Walker algorithm for a single image as a quadratic minimization problem (also see [16]). As is common, we assume a 4-connected neighborhood over the image, weighted according to a Gaussian function of normalized Euclidean distances between pixel intensities, wij = exp (−β||pi − pj||). The Laplacian L of the graph is then
| (2) |
The Laplacian is diagonally dominant and so L ≽ 0; we can derive the following convex quadratic program,
| (3) |
where x(s) are the values for certain seed pixels, and m(s) is the known value of those seeds (i.e., foreground or background). Each component of the solution x* will then be a pixel’s probability of being assigned to the foreground. To output a {0, 1} segmentation, we may threshold x* at , to obtain a hard x ∈ {0, 1}n segmentation which matches the solution from [11].
Pre-processing
Cosegmentation methods [7] use a pre-processing procedure to determine inter- (and intra-) image pixel similarity. This is generated by tesselating the RGB color space (i.e., pixel distribution) into clusters or by using SIFT (or color pattern models, edge-profiles, textures etc) based correspondence methods, see [17]. We can derive a matrix H such that
| (4) |
Here, pixels are assigned to the same bin if they are similar. With an appropriate H, the global term Eglobal from (1) requires that at the level of individual histogram bins k, the algorithm assign approximately the same number of pixels to each foreground region (the objective incurs a penalty proportional to this difference). This ensures that the appearance models of the two foregrounds based on the features of interest are similar, and has been used very successfully in object recognition [18]. Observe that this difference only serves as a regularizer for the main segmentation task, and does not drive it on its own. This is relevant because as with any global measure, such models (and the measurement of their variations) may not be perfect. But existing literature suggests that when derived from good context-based features [18], such appearance model based differences provide a meaningful global bias for Cosegmentation [2].
Cosegmentation for 2+ images
Given a segmentation for n pixels, x ∈ {0, 1}n, one may use the H matrix from (4) to write the histogram of only the foreground pixels as h = Hx. The expression gives the form of constraints needed for Cosegmentation. Let L1, ···, Lm be the Laplacian matrices of graphs constructed using each of the images, and H1, ···, Hm be the histogram assignment matrices from (4), with the property that for pixels j and j′ if and only if j and j′ are similar. Here, if one uses SIFT matches, then the matrix entry may reflect the confidence of the match. Now, we seek to segment the two images simultaneously, under the constraint that their histograms match. For this purpose, it suffices to consider the following optimization model
| (5) |
The second term in the objective above corresponds to Eglobal(h1, h2) in (1), and the last constraint sets up the foreground histograms, hi using Hi and (bin k in H1 corresponds to bin k in H2, ···, Hm which makes a direct comparison between histograms possible). Instead of comparing the histograms to each other, we compare them to a common global histogram h̄ which at the optimum will be the centroid of the hi’s. The resulting inter-image matching penalty will then be the trace of the covariance between foreground histograms across the image set. This model additionally extends to multiple labels (i.e., multiple objects) by adding additional columns to the optimization variables identically to [11]. The resulting problem can be easily decomposed into separate segmentations for each object class. Existing cosegmentation methods, on the other hand, mostly tackle figure-ground labeling.
Each Laplacian matrix Li is postive semidefinite, so along with the histogram distances the objective function is convex. Further, the feasible region is the intersection of bound constraints and linear equalities. We directly have:
Theorem 3.1
For λ ≥ 0, (5) is a convex problem.
3.1 Deriving an equivalent Box-QP
The model in (5) can already be solved using widely available convex programming methods, and provides the desired solution to the Cosegmentation problem using the Random Walker segmentation function. Next, we will derive an equivalent model (but with a much nicer structure) that will allow the design of specialized solvers and thereby lead to far more efficient algorithms.
Consider the left hand side of the equality constraint on each hi, substituted into the objective function with a penalty ||h1 − h2||. Further, let us choose bounds to limit x to the unit box as well as suitably enforce the seed constraints. This process gives a quadratic problem of the form
| (6) |
where the 2-tuple (li, ui) is (1, 1) for foreground seeds, (0, 0) for background seeds, and (0, 1) otherwise. For m > 2 images we optimize over x1, ···, xm, h̄ with quadratic objective matrix
It can be verified that (6) is equivalent to (5). The difference is that it is now expressed as a bound-constrained quadratic problem (or box-QP due to the box constraints). Like (5), the model in (6) also permits general purpose convex programming methods. However, we can design means to exploit its special structure since the model is nearly an unconstrained quadratic problem.
4 Scale-Free Cosegmentation
A limitation of previous cosegmentation methods is their sensitivity to the scale of the target object, since histogram-based priors are dependent on scale. For example, if an otherwise identical object appears in the second image such that it occupies twice as many pixels as in the first image, then h2 = 2h1. Consequently, ||h2 − h1|| > 0, meaning that the larger scale is penalized in traditional formulations. We show here how our formulation may be made scale-invariant. Formally, our goal is to modify the cosegmentation term to satisfy
| (7) |
This property may be satisfied by a normalization step,
| (8) |
Here we further note that under minimization and choice of the parameter λ in the combined model this is equivalent to minimizing
| (9) |
Substituting these normalized histograms in (5) leads to a function that cannot be efficiently optimized. However, in the Random Walker setting, we can optimize this function when the model histogram h̄ is a fixed unit vector. The resulting problem is related to model-based segmentation, where we are imposing a known histogram distribution in segmenting images. For image i we solve the problem
| (10) |
where we name
| (11) |
In order to proceed with the minimization of our scale-invariant energy, we must first establish some properties of (11).
Definition 4.1 ([19])
We define a function f to be quasiconvex if its sublevel sets are convex subsets of the domain. Equivalently, for any x, x′ in the domain of f and λ ∈ [0, 1]
| (12) |
We call (12) the “Jensen’s Inequality for Quasiconvexity.”
Theorem 4.2
ĝh̄ as defined in (11) is quasiconvex.
Proof
Consider any h1, h2 wlog chosen to satisfy
Multiply by (1 − λ)||h1|| ||h2|| ≥ 0
Add λh̄Th1||h1||,
taking any λ ∈ [0, 1].
Since all these vectors are in the nonnegative orthant, h̄Th1, h̄Th2 ≥ 0, and −ĝh̄ (h2) ≥ −ĝh̄ (h1) this inequality is equivalent to:
| (13) |
Using this expression with the triangle inequality to show Jensen’s inequality for quasiconvexity (12) gives
Using (13),
Corollary 4.3
The scale-free energy on x, gh̄ (x) = ĝh̄ (Hx) is quasiconvex for histogram assignment matrix H.
Proof
This G represents the outer composition of E with an affine function. This operation preserves quasiconvexity (see [20]).
The next section exploits these properties of gh̄ to solve the segmentation problem using this penalty.
Theorem 4.4
gh̄(x) is Lipschitz-smooth when ||Hx|| > 0.
Proof
Let . Assume wlog ||h1|| ≥ ||h2||
| (14) |
Thus for any function f which is L-smooth,
| (15) |
And if we lower-bound ||h1|| ≥ ||h2|| ≥ C > 0, f(·̂) is -smooth.
In our case, gh̄ (h) is an affine function of ĥ, and any affine function will be Lipschitz.
4.1 Nonconvex Sum Minimization
For the following, we consider the setting of minimizing h = f + g, such that f is convex, g is quasiconvex and both are bounded below. Note that under these conditions, h is not necessarily quasiconvex and may have multiple local minima. Nonetheless, our method proposed below can optimize our segmentation objective with f(x) = xTLx and g as defined in Corollary 4.3. Let , similarily define and .
Theorem 4.5
Define
| (16) |
For x any solution to .
Proof
Since x is feasible for . With x, both feasible for the problem, x is a solution iff .
We can add these inequalities to show , which by definition of gives equality.
Theorem 4.6
(α) has no solutions for , and x is a solution
to iff x is a solution .
Proof
For the feasible set of
(α) is the
empty set, by the definition of .
Take any . The feasible region of
(α) is a superset of the feasible
region of . Let
x* be a solution to
(α). Since lies in the feasible region of this
problem and for any x, then . As long as the minimum of
f lies in the feasible region of
(α), this will be a solution.
Definition 4.7 ([21])
A function f : [a, b] → ℝ is one-sided Lipschitz if for any x1, x2 ∈ [a, b]
| (17) |
for some m.
Intuitively, in the case that f is continuously differentiable, the Lipschitz condition bounds |f′(x)| while this condition only guarantees an upper bound on f′(x). This is illustrated in Figure 3.
Figure 3.

Given a pair (x, f(x)) for a L-Lipschitz function f, the Lipschitz condition guarantees that the function will always lie between two lines of slope L and −L through this point and in the shaded region shown in (a). The one-sided Lipschitz condition in Definition 4.7 only bounds the rate of increase, and the function must lie in the shaded region shown in (b).
Lemma 4.8
Let ∘ denote the composition operator:
(g ∘
)(α) =
g(
(α)). We claim:
| (18) |
Proof
Let x =
(α).
x satisfies the KKT conditions for the
(α) problem and either:
-
Case 1)
x is the unconstrained minimum of f (i.e. 0 ∈ ∂f(x))
-
Case 2)
x lies on the boundary of the feasible region (so the g(x) ≤ α constraint is active and −∂f(x) ∩ ∂g(x) ≠ ∅).1
As previously used for Theorem 4.6, case 1 will only be true for , with equality in the range we consider here. In case 2, we have the theorem simply from the fact that the constraint is active.
Lemma 4.9
f ∘
is monotonically non-increasing.
Proof
Take any α1 <
α2. Let
x1 =
(α1) and
x2 =
(α2). Since
g(x1) =
α1 <
α2 we know
x1 is feasible for the
(α2) problem. Thus, since
x2 is a minimizer for the
(α2) problem,
f (x1) ≥
f (x2).
Theorem 4.10
h ∘
is one-sided Lipschitz.
Proof
Take any .
Since f ∘ P is monotonically non-increasing,
So if we let m ≥ 1, and because (α1 − α2)2 ≥ 0,
Using Lemma 4.8,
So that h is one-sided Lipschitz with constant m.
With the result of Theorem 4.10, by simply sampling h(P(α)) densely enough, we can get an estimate of this function to arbitrary precision over the entire interval . If we select a global minimum of this estimated function, we can derive a point with objective arbitrarily close to the true minimum of h. The bounds are given from the following theorem:
Lemma 4.11
For any interval and any α ∈ [α1, α2].
| (19) |
Proof
Take m = 1 in Lemma 4.11, applied to this particular α, α2:
By interval construction α − α2 ≤ 0, so
Theorem 4.12
Take an increasing sequence α1, …, αn from the domain of h ∘ P with and . Let τ be the maximum gap αi+1 − αi. Denote the minimum sample by α*, chosen such that
| (20) |
Then the function h has lower bound:
| (21) |
Proof
By Theorem 4.5, we can instead show
| (22) |
Consider any α. By our choice of α1, …, αn there is an αi, αi+1 such that α ∈ [αi, αi+1]. By Lemma 4.11,
by the construction of α*,
and since τ ≥ αi+1 − αi
4.2 Application to Scale-Free Cosegmentation
In the scale-free cosegmentation setting our
problem consists of
| (23) |
With this case f is convex and g is
quasiconvex. The
problem can
therefore be solve efficiently using ordinary methods. We can calculate as the solution to the ordinary random-walker
segmentation and is the projection of
h̄ onto the feasible set under seed
constraints.
5 Optimization
This section describes our strategy for solving (6) to optimality in a highly efficient manner. We use a projected gradient-based method which would additionally form the key component if we were to use any variation (i.e. augmented Langrangian) as stated explicitly in [22].
Identifying a Sparse Structure
Expressing our model as a purely bound-constrained problem as in (6) requires the formation of the products which form dense n × n matrices. Consequently, our optimization method must be careful not to explicitly form these matrices. Fortunately, we observe that explicit formulation of these matrices may be avoided by gradient projection methods, which are a simple and efficient class of algorithms in which it is only necessary to be able to calculate matrix-vector products. Here, this product can be distributed over the sparse components,
| (24) |
With this modification, we can solve our Box-QP in (6) by adapting the Gradient Projection/Conjugate Gradient (GPCG) algorithm of [23]. We describe this strategy next.
5.1 GPCG
GPCG solves quadratic problems with a rectilinear feasible region Ω = {x : l ≤ x ≤ u}. The algorithm alternates between two main phases (GP and CG): these correspond to alternately estimating the active set at the minimum and finding the minimum while keeping the active set fixed.
A) Gradient Projection (GP)
Gradient projection coupled with the projected line search allows fast searching of a wide range of Ω. As a result, we can rapidly estimate which face of the feasible region the optimum lies on.
A.1) Search Direction for GP
In this phase, we search along a projected
gradient ∇Ω
(x) which
has been modified so the corresponding search direction
d = −∇Ω
does not point outside the
feasible region. Specifically, this search direction is constructed to
satisfy ∃ ε > 0 such that x
+ εd ∈ Ω. We then use a
projected line search (described later) to arrive
at a step length α and update
x ← x +
αd.
A.2) Phase Switch for GP
We switch to the conjugate gradient phase if
either of the following conditions are satisfied:
(a) The active set
=
{i : xi =
li or xi
= ui} remains unchanged
between two iterations; note that the active set corresponds to the
minimal face (w.r.t. inclusion) of Ω to which x
belongs; (b) GP is making little progress.
B) Conjugate Gradient (CG)
We search a given face of the feasible region of our model using the conjugate gradient phase described below.
B.1) Search Direction for CG
Given the active set, our algorithm calculates a search direction conjugate to the previous direction (under the projection on to the free variables). Note that this method of generating a search direction is the same as applying ordinary conjugate gradient descent to a restriction of the QP to the current minimal face.
B.2) Phase Switch for CG
If the projected gradient points out of the
current face (or if the iterations are making little progress), we
switch back to the gradient projection (GP) phase. Formally, this is
true if ∃i ∈
(x) and either
Note that these “phase switch” conditions will never be satisfied for the face which contains the global minimum for our model. Thus, when the gradient projection phase has found the correct active set, conjugate gradient iterations suffice to explore the final face.
C) Projected Line Search
The projected line search modifies the active set by an arbitrary
number of elements, and helps our GPCG process. Given a starting point
x and search direction d, the line
search finds α > 0 which produces a
sufficient decrease under the Armijo rule of the
function φ(α) =
(P[x +
αd]), where P
describes the projection function which maps its input to the closest point
in Ω. This can be thought of as a “line search”
along a 1-manifold which bends to stay within Ω (thus, not a ray).
Rather than directly finding all the piecewise quadratic segments of
φ(α), we efficiently
produce a sufficient decrease using an estimate of
φ by sampling one point at a time (as in
[23]). It can be
verified that all operations above can be expressed as
Level 1, 2 BLAS operations. This allows a highly parallel implementation,
described next.
D) GPU Implementation
Graphical Processing Units (GPUs) have gained attention from a wide range of the scientific computing community for their ability to efficiently address parallel problems [24]. These architectures operate by running multiple instances of a piece of code simultaneously, operating on different parts of a dataset. While this approach is not well-suited to all algorithms, Level 1 and 2 BLAS operations used in our algorithm are known to fit well with this architecture and can therefore exhibit a significant speedup. The linear algebra operations comprising our GPCG algorithm for Cosegmentation may be easily parallelized using high-level languages as CUDA. In fact, the CUSPARSE toolkit (used here), supports Level 2 and 3 BLAS operations on sparse matrices as well. Further, the control flow of our procedure relies only on the results of accumulations, so the standard bottleneck of transferring data between main and GPU memory is not a major factor, and entry-level hardware (found on many workstations already) is sufficient.
6 Histogram Construction
In this section we discuss the fact that a wide variety of different preprocesssing steps can take advantage of the same core method to produce a cosegmentation result.
At the heart of cosegmentation is some notion of similarity between pixels. The key property of cosegmentation over independent cosegmentation is that the foregrounds should contain similar distributions of pixels. In the group of cosegmentation methods we consider here, we rely on histograms, in which similar pixels are placed in the same histogram bin and dissimilar pixels are placed in different histogram bins. This is done through a histogram assignment matrix H as in (4).
In order to apply a cosegmentation method to a pair of images, we must first construct this matrix. One aspect of the problem is to assign a bin to some subset (or all) of the pixels between the two images. This needs to be done in such a way that similar foregrounds have similar histograms so that it matches a common-sense idea of when two foregrounds are similar. This includes some invariance to lighting, rotation, etc.
The bin assignment problem can be divided into two steps:
Assign local descriptors [25] to each pixel independently.
Find sets of “similar” pixels.
6.1 Texture-Based Histograms
We empirically found high-quality dense histograms were most consistently produced by clustering in a descriptor space based on texture. We derive the descriptors by applying the 17-filter bank of texture filters proposed in [18] shown in Figure 6. The responses at each pixel can then be clustered using nonparametric methods such as mean-shift or the greedy clustering of [26] with modified stopping conditions, avoiding explicitly specifying the number of clusters k (as we found the optimal k varies somewhat between images).
Figure 6.

Winn filters from [18] used in our histogram generation. The first three blur filters are applied to each channel of the Lab colorspace, with the rest applied only to the lightness channel.
As an alternative to naïve clustering, the authors of [18] further present a method of training a texture dictionary from an incompletely labeled dataset. They present an agglomerative clustering algorithm based on maximizing a probability
| (25) |
for histograms Hn and training labels ĉ. This clustering can be leveraged in texture based cosegmentation algorithms, producing a binning method optimized to handle differentiating between object classes in the training data.
As described above, the first step produces local contextual descriptors of each image pixel. The clustering step finds those pixels which are similar under the given descriptor (similar color and texture will be in the same bin). We also consider SIFT based features in step one, and an optical-flow-based pipeline, discussed below.
6.2 SIFT-based
A popular class of descriptors are those based on gradient binning, including SIFT, GLOH, etc. The high dimension of these descriptors makes it relatively difficult to find high-quality dense matches between the feature vectors of different pixels. In our setting (which does not allow i.e. assuming a homography), we found the matching to either be sparse or overly specific. Representative matching and clustering results are shown in Figure 7, which are reasonable but did not lead to better segmentation results. This additional module was thus not utilized further.
Figure 7.

Clustering (a) and matching (b) results generated using VLFeat [27].
6.3 Histograms from Optical Flow
The flexibility of our method with respect to histogram construction allows the use of application-specific preprocessing steps. There is a natural parallel between cosegmentation and the task of segmenting video. When the images are temporally related, we have a relationship between foreground pixels determined by the movement shown in the video. This is a well-studied problem in computer vision, allowing us a class of optical flow methods.
We can find corresponding pixels using optical flow, placing pixels i and i′ in the same bin if the position of i maps to i′ in the next image. Applying this scheme to the frames of a video sequence suggests the model
| (26) |
where the second sum compares subsequent frames of an image sequence.
In Fig. 16 we perform correspondences based on optical flow to map superpixel-based bins in order to segment frames of a video sequence. Other correspondence determination methods can be used and no change to our algorithm is needed.
Figure 16.

Segmentation using correspondences from optical flow on video sequence from [28]. Shows outline of segmented foreground in red, with foreground and background indications. Our algorithm achieves 99.3% accuracy.
6.4 Histogram Entropy
As discussed above, using low-entropy histograms allows for more accurate matching between images for some images. Intuitively, recall the example from Fig. 1 where a high-entropy histogram tries to differentiate between different patches of fur on a brown bear. This cannot be done in a consistent manner across realistic images, so erroneous matches are introduced. By contrast, our histogram matching technique better describes the the true texture description of the bear’s fur as a combinations of very few homogenous textures. We verify this experimentally in Fig. 8 which plots the statistical distance between the histogram of the true foreground for a sample of images from our dataset. In these cases we found that as the entropy increases, so does the JS divergence measure between the histograms for the true segmentations. Low entropy histograms, however, relate to smaller divergences, which impose the global Cosegmentation constraint more tightly.
Figure 8.

Comparing the entropy of the constructed histogram (x axis), with a symmetric KL-divergence between the true foreground histograms (y axis), as we vary the number of clusters. Each line shows one image pair. We consistently see higher-entropy histograms producing greater divergence for the target segmentation.
7 Experiments
Our experimental evaluations included comparisons of our implementation (on a Nvidia Geforce 470) to another Cosegmentation method [7], and the Random Walker algorithm [11] (run independently on both images). We also performed experiments using the methods in [4] and [9], but due to the problem of solving a large LP and incorporation of foreground/background seeds respectively, results could not be obtained for the entire dataset described below. To assess relative advantages of the specialized GPCG procedure, we also compared it with a stand-alone implementation of (5) linked with a commercial QP solver (using multiple cores). We provide a qualitative and quantitative overview of the performance w.r.t. state of the art. Additional experiments demonstrate the efficacy of the multiple-image and scale-free segmentation models.2
7.1 Weak Boundary Experiment
For an initial probe of the behavior of the Random Walker as the engine of a cosegmentation method, we replicate the experiment of [11] in a cosegmentation setting. The setup in [11] used a synthetic image consisting of two white triangles which were separated by a black line that contained a large hole, such that the triangles appear to be touching (see Figure 9). This image is challenging to segment into the two triangles because each triangle exhibits exactly the same appearance (white), and there is a substantial portion of the boundary between them for which there is zero contrast. Consequently, segmentation algorithms which rely exclusively on appearance priors or boundary contrast are unable to separate the two triangles. In particular, [11] showed that a segmentation driven by a boundary length prior (e.g., graph cuts) was unable to separate these triangles when the inputs consisted of only a single seed in each triangle.
Figure 9.

Figure showing RW cosegmentation’s performance on weak boundaries. Columns (left to right): (1) Identical pair of images, with seeds placed only on the top image; (2) Output potentials of random walker; (3) Output of graph-cuts based cosegmentation (4) Identical histogram bin assignments for both images (common colors indicate similar pixels, considered both inter and intra image);
We may investigate the same scenario in a cosegmentation setting. Specifically, if we let an image pair consist of two replicas of the “two triangles” image from [11], we may supply one seed in each of the triangles of only one of the images. Further, we set up a synthetic histogram in such a way that the desired (i.e., perceptual) partition incurs zero histogram variation penalty relative to Eglobal in (1). An example of such a histogram pair is shown in Fig. 9 (rightmost column) where common colors indicate similarity among pixels in both images (based on some arbitrary feature). Given this input image pair, our cosegmentation algorithm based on the Random Walker is able to successfully segment both images into two triangles (see Fig. 9, second column), despite the fact that each triangle exhibits the same appearance and it not well-separated by the boundary. In contrast, if we apply the traditional cosegmentation model which penalizes boundary length to the same problem, these two images are poorly segmented as a result of the difficulty of the problem instance (see Fig. 9, third column). Note that while this experiment is artificially constructed, it nonetheless suggests that Cosegmentation based on the Random Walker will retain the beneficial properties of the Random Walker segmentation algorithm and some of its advantages over boundary length regularization.
7.2 Datasets
In order to leverage all available test data we aggregated all images provided by the authors in iCoseg [2], Momi-coseg [3] and Pseudoflow-coseg [7], and further supplemented them with a few additional images from the web and [28]. In order to compare with algorithms that only handle image pairs we selected a dataset with 50 pairs of images (100 total). For the > 2 image case we used the iCoseg dataset from [2]. Since a number of image sets from (from [2]) share only semantically similar foreground objects and are unsuitable for cosegmentation with common appearance models, we selected 88 subsets comprising 389 of the 643 iCoseg images (also observed in [13]).
7.3 Running time complexity
We now discuss what is the strongest advantage of this framework. We show an example in Fig. 10 of the running time of the proposed model relative to [7], as a function of decreasing entropy (number of bins). The plot suggests that for a realistic range, our implementation has a negligible computation time relative to [7] and the CPLEX-based option – in fact, our curve almost coincides with the x-axis (and even this cost was dominated primarily by the overhead from problem setup done on the CPU). For the most expensive data point shown in Fig. 10, the model from [7] generated 107 auxiliary nodes (about 12 GB of memory). Due to the utility of low entropy histograms, these experiments show a salient benefit of our framework and its immunity to the ‘coarseness’ of the appearance model. Over all 128 × 128 images, the wall-clock running time of our CUDA-based model was 10.609 ± 5.230 seconds (a significant improvement over both [7, 9]). The time for a Cplex driven method (utilizing four cores in parallel) was 17.982 ± 5.07 seconds, but this increases sharply with greater problem size.
Figure 10.

Variation in run-time with histogram granularity (a) and image size (b) relative to a Cplex-based implementation and Pseudoflow-coseg [7].
An artificial image shown in Figure 11 was used in order to allow for isolation of specific variables, and mitigate variability (in optimization time) as a function of the specific problem instance.
Figure 11.

Artificial image used in computation time experiments (a) and corresponding histogram bins (b). Seed points were placed as shown in both foreground and background. To create a cosegmentation problem, the same image was used twice.
With an increase in the image size, the running time of the model is expected to increase because of two reasons. First, the number of pixels to segment determines the size of the input problem and the dimensionality of the decision variables in the optimization. Second, if the histogram is generated the same way across different sizes, the number of pixels in each histogram bin also increases. An analysis of this behavior is presented in Figure 10. The image shown in Fig. 11 was generated for various sizes, with the number of pixels along each side plotted along the x axis in Fig. 10. Our specialized GPU-based Cosegmentation library and an implementation based on the Cplex solver were used for Random Walker Cosegmentation. In this result, we see only marginal increase in the overall running time of our model (the curve stays close to the x-axis); on the other hand, the running time of the Cplex-based implementation increases quickly as we increase the size of the images. Notice how this difference is substantially magnified for larger images. The reason is that our algorithm distributes each individual BLAS operation over the GPU’s computational units; as a result, higher-dimensional operations take better advantage of the parallelization in the model presented in this paper. We believe our method is thus especially suited for large images where computation time is a consideration.
7.4 Effect of λ parameter
The images in Fig. 12 demonstrate the role of the histogram consistency bias, λ in equation (5). For a very small λ, the segmentation probabilities are diffuse (compare to the independent random walker results from Fig. 15); as the influence of the bias grows, the consistency between the histograms makes the partitions more pronounced.
Figure 12.

Effect of varying λ parameter on an example image for λ = 10−8 in Column 2, and λ = 10− 6 in Column 3. A segmentation potential biased towards matching the histograms makes the partitions more pronounced.
Figure 15.

Columns (1–2): Input images; Columns (3–4) segmentation potentials from independent random walker runs on the two images; Columns (5–6) Segmentation potentials from Random Walker based cosegmentation. Note that the object boundaries have become significantly more pronounced because of the histogram constraint.
7.5 Performance w.r.t. pair methods
We evaluated the quality of segmentations (0/1 loss) on the 50 image pairs described above, relative to Pseudoflow-coseg [7], LP [4] (only partial), and discriminative clustering [9]. As in [11], a few seeds were placed to specify foreground and background regions, and given to all methods. Representative samples on the images from [2] using the method in [7, 9] are shown in Fig. 13. Averaged over the pair dataset, the segmentation accuracy of our method was 93:6 ± 2:9%, where as the gross values for the algorithms from [7] and [9] were 89:1% and 84:1% respectively.
Figure 13.

Comparison results on example images (columns 1,2) of the the Pseudoflow based method of [7] (columns 3,4) and the discriminative clustering approach of [9] (columns 5,6), with segmentation from RW-based cosegmentation (columns 7,8).
7.6 Performance on 2+ images
Across the iCoseg dataset we achieved an accuracy of 93:7% with seeds provided by 5 different users. The algorithm of [9] achieves an accuracy of 82:2% across the dataset (excluding some for which the implementation provided by the authors did not complete). Representative image sets and accuracy comparisons appear in Table 1 and Figure 14.
Table 1.
Segmentation accuracy for some iCoseg image sets. Subsets were chosen which have similar appearance under a histogram model.
|
Figure 14.

Interactive segmentation results using the multi-image model across five images from the “Kendo” set of iCoseg. As the number of images increases, less user input is required, as seen by the lack of such for the middle image.
We note that these results must be interpreted with the caveat that different methods respond differently to seed locations and the level of discrimination offered by the underlying appearance model. Since most cosegmentation models (including this paper) share the same basic construction at their core (i.e., image segmentation with an appearance model constraint), variations in performance are in part due to the input histograms. The purpose of our experiments here is to show that at the very least one can expect similar (if not better) qualitative results with our model, but with more flexibility and quite significant computational advantages.
7.7 Comparisons to independent Random Walker runs
In Fig. 15, we present qualitative results from our algorithm, and from independent runs of Random Walker (both with up to two seeds per image). A trend was evident on all images – the probabilities from independent runs of Random Walker on the two images were diffuse and provide poorer boundary localization. This is due to the lack of global knowledge of the segmentation in the other image. Random walker based Cosegmentation is able to leverage this information, and provides better contrast and crisp boundaries for thresholding (a performance boost of up to 10%).
8 Conclusions
We present a new framework for the cosegmentation problem based on the Random Walker segmentation approach. While almost all other cosegmentation methods view the problem in the MRF (graph-cuts) setting, our algorithm translates many of the advantages of Random Walker to the task of simultaneously segmenting common foregrounds from related images. Significantly, our formulation completely eliminates a practical limitation in current (nonparametric model based) Cosegmentation methods that requires the overall image histogram to be approximately flat (in order to restrict the number of auxiliary nodes added). Our model extends nicely to the multi-image setting using a penalty with statistical justification. A further extension allows model-based segmentation which is independent of the relative scales of the model and target foregrounds. We discuss its optimization specific properties, give a state of the art GPU based library, and show quantitative and qualitative performance of the method.
Figure 2.

Segmentation using the model of section 4 on a set of images from the iCoseg dataset with differences in scale. h̄ from an image in the same set was applied as a prior.
Figure 4.

A series of expanding sublevel sets of f (solid ellipses) and the corresponding minimal intersecting sublevel sets of g (dashed ellipses). The gradient at a boundary is normal to a supporting line, and each pair of sublevel sets intersect only along this line.
Figure 5.

Illustration of the lower bound used in Lemma 4.11. If we only have the sampled points shown, we can nonetheless guarantee that a function which is one-sided Lipschitz will lie above the dashed lines.
Acknowledgments
This work is funded via NIH 5R21AG034315-02 and NSF 1116584. Partial support was provided by UW-ICTR through an NIH CTSA Award 1UL1RR025011 and NIH grant 5P50AG033514 to W-ADRC. Funding for M.D.C. was provided by NLM 5T15LM007359 via the CIBM Training Program. The authors would like to thank Petru M. Dinu for consultations on GPGPU development.
Footnotes
Here ∂f denotes the subdifferential of f.
The full set of segmentation results and code is available at http://pages.cs.wisc.edu/~mcollins/pubs/cvpr2012.html
Contributor Information
Maxwell D. Collins, Email: mcollins@cs.wisc.edu.
Jia Xu, Email: jiaxu@cs.wisc.edu.
Leo Grady, Email: leo.grady@siemens.com.
Vikas Singh, Email: vsingh@biostat.wisc.edu.
References
- 1.Rother C, Kolmogorov V, Minka T, Blake A. Cosegmentation of image pairs by histogram matching - Incorporating a global constraint in MRFs. CVPR. 2006 [Google Scholar]
- 2.Batra D, Kowdle A, Parikh D, Luo J, Chen T. Interactive cosegmentation with intelligent scribble guidance. CVPR. 2010 [Google Scholar]
- 3.Chu W, Chen C, Chen C. MOMI-cosegmentation: Simultaneous segmentation of multiple objects among multiple images. ACCV. 2010 [Google Scholar]
- 4.Mukherjee L, Singh V, Dyer C. Half-integrality based algorithms for cosegmentation of images. CVPR. 2009 [PMC free article] [PubMed] [Google Scholar]
- 5.Vicente S, Kolmogorov V, Rother C. Cosegmentation revisited: Models & optimization. ECCV. 2010 [Google Scholar]
- 6.Lee YJ, Grauman K. Collect-cut: Segmentation with top-down cues discovered in multi-object images. CVPR. 2010 [Google Scholar]
- 7.Hochbaum DS, Singh V. An efficient algorithm for cosegmentation. ICCV. 2009 [Google Scholar]
- 8.Kowdle A, Batra D, Chen WC, Chen T. iModel: Interactive cosegmentation for object of interest 3D modeling. ECCV Workshop; 2010. [Google Scholar]
- 9.Joulin A, Bach F, Ponce J. Discriminative clustering for image cosegmentation. CVPR. 2010 [Google Scholar]
- 10.Boykov Y, Veksler O, Zabih R. Fast approximate energy minimization via graph cuts. PAMI. 2001;23(11):1222–1239. [Google Scholar]
- 11.Grady L. Random walks for image segmentation. PAMI. 2006;28(11):1768–1783. doi: 10.1109/TPAMI.2006.233. [DOI] [PubMed] [Google Scholar]
- 12.Mukherjee Lopamudra, Singh Vikas, Peng Jiming. Scale invariant cosegmentation for image groups. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR); June 2011; [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Vicente S, Rother C, Kolmogorov V. Object cosegmentation. CVPR. 2011 [Google Scholar]
- 14.Chang K, Liu T, Lai S. From co-saliency to cosegmentation: An efficient and fully unsupervised energy minimization model. CVPR. 2011 [Google Scholar]
- 15.Sinop AK, Grady L. A seeded image segmentation framework unifying graph cuts and random walker which yields a new algorithm. ICCV. 2007 [Google Scholar]
- 16.Levin A, Lischinski D, Weiss Y. A closed-form solution to natural image matting. PAMI. 2008;30(2):228–242. doi: 10.1109/TPAMI.2007.1177. [DOI] [PubMed] [Google Scholar]
- 17.Cui J, Yang Q, Wen F, Wu Q, Zhang C, Van Gool L, Tang X. Transductive object cutout. CVPR. 2008 [Google Scholar]
- 18.Winn J, Criminisi A, Minka T. Object categorization by learned universal visual dictionary. ICCV. 2005 [Google Scholar]
- 19.Bazaraa MS, Sherali HD, Shetty CM. Nonlinear Programming. John Wiley & Sons, Inc; 2003. [Google Scholar]
- 20.Boyd Stephen, Vandenberghe Lieven. Convex Optimization. Cambridge University Press; 2004. [Google Scholar]
- 21.Frank Reinhard, Schneid Josef, Ueberhuber Christoph W. Stability properties of implicit Runge-Kutta methods. SIAM J on Numerical Analysis. 1985;22(3):497–514. [Google Scholar]
- 22.Nocedal J, Wright SJ. Numerical Optimization. 2. Springer; USA: 2006. [Google Scholar]
- 23.More JJ, Toraldo G. On the solution of large quadratic programming problems with bound constraints. SIAM J on Optimization. 1991;1(1):93–113. [Google Scholar]
- 24.Lee S, Wright SJ. Technical report. University of Wisconsin Madison; 2008. Implementing algorithms for signal and image reconstruction on graphical processing units. [Google Scholar]
- 25.Mikolajczyk K, Schmid C. A performance evaluation of local descriptors. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 2005;27(10):1615–1630. doi: 10.1109/TPAMI.2005.188. [DOI] [PubMed] [Google Scholar]
- 26.Gonzalez TF. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science. 1985;38:293–306. [Google Scholar]
- 27.Vedaldi A, Fulkerson B. VLFeat: An open and portable library of computer vision algorithms. 2008 http://www.vlfeat.org/
- 28.Tsai D, Flagg M, Rehg JM. Motion coherent tracking with multi-label mrf optimization. British Machine Vision Conference (BMVC); 2010. [Google Scholar]
