Maximum likelihood estimation of the mixture of log-concave densities

Hao Hu; Yichao Wu; Weixin Yao

doi:10.1016/j.csda.2016.03.002

. Author manuscript; available in PMC: 2017 Sep 1.

Published in final edited form as: Comput Stat Data Anal. 2016 Sep;101:137–147. doi: 10.1016/j.csda.2016.03.002

Maximum likelihood estimation of the mixture of log-concave densities

Hao Hu ^a,^*, Yichao Wu ^a, Weixin Yao ^b

PMCID: PMC4820769 NIHMSID: NIHMS767573 PMID: 27065505

Abstract

Finite mixture models are useful tools and can be estimated via the EM algorithm. A main drawback is the strong parametric assumption about the component densities. In this paper, a much more flexible mixture model is considered, which assumes each component density to be log-concave. Under fairly general conditions, the log-concave maximum likelihood estimator (LCMLE) exists and is consistent. Numeric examples are also made to demonstrate that the LCMLE improves the clustering results while comparing with the traditional MLE for parametric mixture models.

Keywords: Consistency, Log-concave maximum likelihood estimator (LCMLE), Mixture model

1. Introduction

The finite mixture model (McLachlan & Peel, 2000; Mcnicholas & Murphy, 2008) provides a flexible methodology for both theoretical and practical analysis. It has a density of the form

f (x) = \sum_{j = 1}^{K} λ_{j} g_{j} (x; θ_{j}) x \in ℝ^{d},

(1.1)

where λ₁, …, λ_K are the mixing proportions and g_j(x; θ_j)’s are component densities. The unknown parameters in the mixture model (1.1) can be estimated by the EM algorithm, see e.g. Dempster et al. (1977) and McLachlan & Krishnan (2007). One major drawback of the traditional mixture model (1.1) is the strong parametric assumption about the component density g_j . It is often too restrictive and the density estimation may be inaccurate due to the model misspecification. Another drawback is that each model requires a specific EM algorithm based on the parametric assumption.

To relax the parametric assumption, nonparametric shape constraints are becoming increasingly popular. In this paper, we make one fairly general shape constraint for our mixture model. We assume that each component density is log-concave. A density g is log-concave if log g is concave. Examples of log-concave densities include normal, Laplace, logistic, as well as gamma and beta with certain parameter constraints. Logconcave densities have lots of nice properties as described by Balabdaoui et al. (2009). Their nonparametric maximum likelihood estimators were studied by Dümbgen & Rufibach (2009), Cule et al. (2010), Cule & Samworth (2010), Chen & Samworth (2013), Pal et al. (2007) and Dümbgen et al. (2011) (referred as [DSS 2011] thereafter). The convergence rates of these estimators for log-concave densities were studied by Doss & Wellner (2013) and Kim & Samworth (2014). Such estimators provide more generality and flexibility without any tuning parameters.

In our model, we assume that x₁, …, x_n are independent d-dimensional random variables with distribution Q₀ and the mixture density f₀. The mixture density f₀ belongs to a given class

F = {f : f (x) = \sum_{j = 1}^{K} f_{j} (x) = \sum_{j = 1}^{K} λ_{j} exp {ϕ_{j} (x)}, λ \in Λ, ϕ \in Φ},

(1.2)

where λ = (λ₁, …, λ_K), $Λ = {(λ_{1}, \dots, λ_{K}) : 0 < λ_{j} < 1, \sum_{j = 1}^{K} λ_{j} = 1}$ , ϕ = (ϕ₁, …, ϕ_K), and Φ = {(ϕ₁, …, ϕ_K) : ϕ_j is concave}. We assume that each ϕ_j is continuous and is coercive in the sense that ϕ_j(x) → −∞ as ||x|| → ∞ (j = 1, …, K).

One issue for mixture models is that the likelihood might be unbounded in some cases. For example, the likelihood function for a normal mixture takes the form of $L (θ ∣ x) = \sum_{i = 1}^{n} (λ g (x_{i} ∣ μ_{1}, σ_{1}^{2}) + (1 - λ) g (x_{i} ∣ μ_{2}, σ_{2}^{2}))$ , where $θ = {(λ, μ_{1}, μ_{2}, σ_{1}^{2}, σ_{1}^{2}) : σ_{1}^{2}, σ_{1}^{2} > 0, λ \in (0, 1)}$ and g is the density function for the standard normal distribution. When μ₁ = x₁ and $σ_{1}^{2} \to 0$ , L(θ|x) → ∞ (see Section 3.10 of McLachlan & Peel (2000) for detailed discussions). Many methods have been proposed to solve the unboundedness issue of mixture likelihood, see for example, Hathaway (1985), Chen et al. (2008), and Yao (2010). Note that, similar to traditional normal mixture models with unequal variances, the likelihood functions for mixtures of log-concave densities are unbounded as well. Thus, similar to Hathaway (1985), we will define LCMLE on a constrained parameter space. Let M_j(ϕ) = max_x∈ℝ^d{ϕ_j(x)}, M₍₁₎(ϕ) = min_j{M_j(ϕ)}, and M₍_K₎(ϕ) = max_j{M_j(ϕ)}. We further define the ratio 𝒮(ϕ) = M₍₁₎(ϕ)/M₍_K₎(ϕ). Here, we borrow the idea of Hathaway (1985) by restricting our interest to a constrained subspace Φ_η such that Φ_η = {ϕ ∈ Φ : |𝒮(ϕ)| ≥ η > 0} for some η ∈ (0, 1]. This restriction avoids estimating the case that the modes of different components differ a lot. By restricting on Φ_η, we focus our interest on f ∈ ℱ_η, where

F_{η} = {f : f (x) = \sum_{j = 1}^{K} f_{j} (x) = \sum_{j = 1}^{K} λ_{j} exp {ϕ_{j} (x)}, λ \in Λ, ϕ \in Φ_{η}} .

(1.3)

Let Q_n be the empirical distribution of X₁, …, X_n. The (restricted) log-concave maximum likelihood estimator (LCMLE) is

f_{n} = f (\cdot ∣ Q_{n}) = \underset{f \in F_{η}}{argmax} \int log (f) d Q_{n} .

(1.4)

In practice, similar to Hathaway (1985), picking η can be tricky for some extreme case. If η is too small, there might be a chance that some boundary point |𝒮(ϕ)| = η maximizes the log-likelihood and the solution will depend on the choice of η. In this paper, we do not focus on the issue of choosing η. The constrained subspace Φ_η is mainly used for theoretical development. Based on our empirical experience, if we start the algorithm from a reasonable initial value, such as the maximum likelihood estimate assuming all components are normal with equal variance, the unboundedness issue is very rare.

Many methods have been proposed to relax the parametric assumption of (1.1). Hunter et al. (2007), Bordes et al. (2006a), Butucea & Vandekerkhove (2014), and Chee & Wang (2013) considered the extension of (1.1) by assuming all component densities are symmetric but unknown. Bordes et al. (2006b), Bordes & Vandekerkhove (2010), Hohmann & Holzmann (2013), Xiang et al. (2014), and Ma & Yao (2015) considered the extension of (1.1) when K = 2 and one of the component densities is symmetric but unknown. Mixtures of log-concave densities have been studied by Chang & Walther (2007), Cule et al. (2010) and Balabdaoui & Doss (2014). Chang & Walther (2007) provided an EM-type algorithm and demonstrated sound numerical results in the simulation study. Cule et al. (2010) applied the log-concave mixture model to theWisconsin breast cancer data set. Balabdaoui & Doss (2014) considered a special case when all components have the same symmetric log-concave densities but with different location parameters, and proved the $\sqrt{n}$ -consistency of their proposed M-estimators for mixing proportion as well as location parameters. Note that these models are special cases from the family of ℱ. Therefore, their estimators and asymptotic results cannot be applied here. For example, the mixture of normal distributions with different component means and variances belongs to ℱ but does not belong to the model family considered by Balabdaoui & Doss (2014).

To the best of our knowledge, none of the existing works has studied the theoretical properties of the estimator for the log-concave mixture model (1.2) under such general conditions. This paper aims to fill in this gap. We show that theoretically, the LCMLE (in the restricted subset ℱ_η) exists, and is consistent under fairly general conditions. However, we want to point out that the extension of the properties of the log-concave density to mixtures of log-concave densities is not trivial. The log-density l_n = l(·|Q_n) = log f_n is no longer guaranteed to be a concave function. Consequently, many nice theoretical properties stated in DSS 2011 no longer hold for our mixture model.

The rest of the paper is organized as follows. Section 2 introduces the basic setup, model details, and notations. Section 3 states the theoretical properties. We review the EM-type algorithm for log-concave mixture models in Section 4. Simulation and real data studies are conducted in Section 5 and 6. We end the article with a short conclusion in Section 7. The proofs and lemmas are presented in the appendix.

2. Log-concave maximum likelihood estimator

Let 𝒬 = 𝒬(d) be the family of all distributions Q on ℝ^d. Our goal is to maximize a log-likelihood-type functional:

L (ϕ, λ, π, Q) = \int log [\sum_{j = 1}^{K} λ_{j} exp {ϕ_{j} (x)}] d Q (x) - \sum_{j = 1}^{K} π_{j} (\int exp {ϕ_{j} (x)} d x - 1),

(2.1)

where π_j’s are Lagrange multipliers to incorporate the constraint ∫ exp{ϕ_j(x)}dx = 1 (j = 1, …, K). We define a profile log-likelihood:

L (Q) = sup_{ϕ \in Φ_{η}, λ \in Λ, π} L (ϕ, λ, π, Q) .

(2.2)

If, for fixed Q, (ψ, λ^*, π^*) maximizes L(ϕ, λ, π, Q), it will automatically satisfy that:

π_{j}^{*} = E (π (j ∣ x)) = \int \frac{λ_{j}^{*} exp {ψ_{j} (x)}}{(\sum_{h = 1}^{K} λ_{h}^{*} exp {ψ_{h} (x)})} d Q (x);

(2.3)

\int exp {ψ_{j} (x)} d x = 1 (j = 1, 2, \dots, K) .

(2.4)

Note that differing from the non-mixture setting in DSS 2011, $π_{j}^{*}$ is not equal to 1.

To verify this, note that ϕ + c ∈ Φ for any fixed vector of functions ϕ ∈ Φ and arbitrary c = (c₁, …, c_K)^T ∈ ℝ^K, and

\begin{array}{l} \frac{\partial L (ψ + c, λ, π, Q)}{\partial c_{h}} ∣_{c = 0} = (\int \frac{λ_{h} exp {ψ_{h} (x)}}{\sum_{j = 1}^{K} λ_{j} exp {ψ_{j} (x)}} d Q (x) - π_{h} \int e^{ψ_{h} (x)} d x) = 0, \\ \frac{\partial L (ψ, λ, π, Q)}{\partial π_{h}} = 1 - \int exp {ψ_{h} (x)} d x = 0. \end{array}

The maximizer (ψ, λ^*) forms the log-likelihood maximizer $l^{*} (x) = log \sum_{j = 1}^{K} λ_{j}^{*} e^{ψ_{j} (x)}$ .

3. Theoretical Properties

Before we state the main theories, we first define the convex support of a distribution.

Definition

For any distribution Q, let Q(C) be the probability measure of the set C. The convex support of Q is the set such that:

csupp (Q) = ⋂ {C : C \subseteq ℝ^{d} closed and convex, Q (C) = 1} .

The convex support is itself closed and convex with Q(csupp(Q)) = 1.

In the following text. we define:

𝒬¹ = {Q ∈ 𝒬 : ∫ ||x||dQ < ∞}, (we define ||x|| as Euclidean norm in our paper).
𝒬⁰ = {Q ∈ 𝒬 : interior(csupp(Q)) ≠ ∅}.

Theorem 1

For any Q ∈ 𝒬¹ ∩ 𝒬⁰, the value of L(Q) is real and there exists a maximizer:

(ψ, λ^{*}, π^{*}) = \underset{ϕ \in Φ_{η}, λ \in Λ, π}{argmax} L (ϕ, λ, π, Q) such that \int e^{ψ_{j} (x)} d x = 1 for j = 1, \dots, K .

Next, we establish the consistency of the estimated mixture density. In the following, we refer to the concept of convergence of distribution as converging with respect to Mallows distance D₁: D₁(Q, Q′) = inf₍_X_,_X_′) E||X − X′||, where Q and Q′ are two distributions and the infimum is taken over all pairs of (X, X′) such that X ~ Q and X′ ~ Q′. The convergence of Q_n to Q₀ with respect to Mallows distance, i.e. lim_n_→∞ D₁(Q_n, Q) = 0, is equivalent to assuming that Q_n weakly converges to Q₀, denoted by Q_n →^w Q, and ∫ ||x||dQ_n(x) → ∫ ||x||dQ(x) as n → ∞.

Theorem 2

Let Q_n be a sequence such that lim_n_→∞ D₁(Q_n, Q₀) = 0 for some Q₀ ∈ 𝒬¹ ∩ 𝒬⁰. Then,

lim_{n \to \infty} L (Q_{n}) = L (Q_{0}) .

Let ϕ_nj’s and λ_nj’s be the maximizer corresponding to profile log-likelihood L(Q_n), i.e, f_n(x) = Σλ_nj exp{ϕ_nj(x)} = f(·|Q_n) ∈ ℱ_η. For f₀(x) = f(·|Q₀) ∈ ℱ_η, we have:

lim_{n \to \infty, x \to y} f_{n} (x) = f_{0} (y) for all y \notin \partial {f_{0} \geq 0},

(3.1)

lim_{n \to \infty, x \to y} f_{n} (x) \leq f_{0} (y) for all y \in ℝ^{d},

(3.2)

lim_{n \to \infty} \int ∣ f_{n} (x) - f_{0} (x) ∣ d x = 0.

(3.3)

The above theorem showed the consistency of the estimated mixture density. If we further assume that the true mixture density f₀(x) is identifiable, then each estimated component density and mixing proportions are also consistent. We will discuss more about the identifiability issue in Section 7.

4. EM-type algorithm

The EM algorithm for estimating log-concave mixture densities has already been developed by Chang & Walther (2007). Here we briefly summarize it. First we randomly generate initial values for the normal mixture EM-algorithm and run the normal mixture EM-algorithm until convergence, which will provide a good initial value. Then we use the outcome as the starting values for our EM-type algorithm. We assume the observed data X = (x₁, …, x_n)^T ∈ ℝⁿ^×^d to be incomplete and define the missing value Z = (z₁, …, z_n)^T, where x_i = (x_i_,1, …, x_i_,_d)^T and z_i is a K-dimension vector with its j-th element given by:

z_{i j} = {\begin{cases} 1 & if x_{i} belongs to j th group, \\ 0 & otherwise . \end{cases}

So the complete log-likelihood is:

log f (ϕ, λ; X, Z) = log \prod_{i = 1}^{n} \prod_{j = 1}^{K} {[λ_{j} e^{ϕ_{j} (x_{i})}]}^{z_{i j}} = \sum_{i = 1}^{n} \sum_{j = 1}^{K} z_{i j} [log λ_{j} + ϕ_{j} (x_{i})],

In E-step, we replace z_ij by

z_{i j}^{(t + 1)} = \frac{λ_{j}^{(t)} e^{{\hat{ϕ}}_{j}^{(t)} (x_{i})}}{\sum_{h = 1}^{K} λ_{h}^{(t)} e^{{\hat{ϕ}}_{h}^{(t)} (x_{i})}} .

In M-step, first we update λ by $λ_{j}^{(t + 1)} = \frac{1}{n} \sum_{i = 1}^{n} z_{i j}^{(t + 1)}$ , j = 1, …, K. Then we update ϕ_j by maximizing $\sum_{i = 1}^{n} z_{i j}^{(t + 1)} ϕ_{j} (x_{i})$ with respect to ϕ_j through the function called mlelcd in the R package LogConcDEAD (Cule et al. (2009)) and get estimator ${\hat{ϕ}}_{j}^{(t + 1)}$ for j = 1, …, K. The estimation of ϕ̂_j has been studied by Walther (2002) and Rufibach (2007). Given i.i.d. data X₁, …, X_n from distribution f, the Log-concave Maximum Likelihood Estimator (LCMLE) f̂_n exists uniquely and has support on the convex hull of the data (by Theorem 2 of Cule et al. (2010)). The log-likelihood estimator log f̂_n is a piecewise linear function with knots which are a subset of {X₁, …, X_n}. Walther (2002) and Rufibach (2007) provided algorithms for computing f̂_n(X_i), i = 1, …, n. The entire log-density log f̂_n can be computed by linearly interpolating between between log f̂_n(X₍_i₎) and log f̂_n(X₍_i₊₁₎). Walther (2002) and Rufibach (2007) also pointed out that it is natural to apply weights for an EM-type algorithm. The $z_{1 j}^{(t + 1)}, \dots, z_{n j}^{(t + 1)}$ can be viewed as weights for x₁, …, x_n when estimating the log-concave density ϕ_j in our algorithm for j = 1, …, K. The algorithm stops once the increasing increment ℓ⁽^t⁺¹⁾ − ℓ⁽^t⁾ is below 10⁻⁷, where $ℓ^{(t)} = \sum_{i = 1}^{n} log \sum_{j = 1}^{k} λ_{j}^{(t)} exp {ϕ_{j}^{(t)} (x_{i})}$ .

To avoid the local maximum, we restart the algorithm 20 times and choose the result with the highest log-likelihood. As we discussed in Section 1, the unboundedness issue of the log-likelihood does happen infrequently, mostly due to an inappropriate initial. In our algorithm, we borrow the idea of restarting process in many existing EM-algorithms for parametric mixture models, e.g. Benaglia et al. (2009). If the loglikelihood goes to infinity in any iteration, our EM-type algorithm will be forced to restart from the beginning with a new randomly chosen initial.

5. Simulation Results

5.1. Copula procedure to generate multivariate log-concave mixtures

As we do not have a tuning issue for LCMLE, the most attractive application of LCMLE is the density estimation with dimensionality higher than 1. To generate data from a multivariate log-concave mixture model, we borrow the idea of the copula procedure from Chang & Walther (2007). For a d-dimensional log-concave mixture density, we observe n observations x₁, …, x_n, where x_i = (x_i_,1, …, x_i_,_d)^T ∈ ℝ^d. To simplify our simulation, we focus on the model whose univariate marginal distributions are log-concave. We model the dependence structure with a normal copula. Suppose (N₁, …, N_d)^T are multivariate normal with mean 0 and covariance matrix Σ. Let F₁, …, F_d be the CDFs of the desired univariate log-concave distributions. Then,

x_{i} = {(x_{i, 1}, \dots, x_{i, d})}^{T} = {(F_{1}^{- 1} (Φ (N_{1})), \dots, F_{d}^{- 1} (Φ (N_{d})))}^{T} .

5.2. Significant Improvement when densities are log-concave mixtures

We first generate 500 observations from a univariate log-concave mixture model: 0.3Logistic(0, 1) + 0.7Laplace(5, 1) (referred as Model I). This setup is a more general form of Chang & Walther (2007), as Chang & Walther (2007) only considered the case that one component is a location shift of the other. For the multivariate cases, we generate 500 observations based on the copula procedure as we discussed in Section 5.1 for Model II through IV, which are multivariate log-concave mixture models with dimensionality d from 2 to 4. For each model, component 1 (with probability 0.3) is generated as a joint normal distribution N(0, I_d); component 2 (with probability 0.7) is generated through a normal copula N(0, 0.5I_d +0.51_d), where I_d is a d×d identity matrix and 1_d is a d×d matrix of ones. The marginal distributions of component 2 are summarized in Table 1.

Table 1.

The simulation setups of Model II – IV.

Model	d	Marginal Distribution of Component 2
II	2	N (0, 1), and Gamma(2, 1) + 2
III	3	N (0, 1), Gamma(2, 1) − 1, and Beta(4, 1)
IV	4	N (0, 1), Gamma(2, 1) + 2, Beta(4, 1), Laplace(0, 1) + 1

Open in a new tab

We repeat the simulation 100 times for each model. When evaluating the simulation results for mixture models, there is a well-known label switching issue when sorting the labels for mixture models (Stephens (2000); Yao & Lindsay (2012)). In this paper, we adopt the method of Yao (2015) to find labels by minimizing the distance between the estimated classification probabilities and the true labels over different permutations. After sorting the labels, we compute the mean square errors obtained by the log-concave EM algorithm (MSE₂) and compare them with the parametric normal EM-algorithm (MSE₁) to compare the accuracy of the estimated λ’s. As mixture models also serve as methods of classification, we compute the average misclassification number (denoted as AMN₂ for the log-concave EM-algorithm and AMN₁ for the normal EM-algorithm) among the 100 replicates. We are also interested in the difference between two classification methods. One of many possible measurements to summarize the similarity between two clusterings is the Adjusted Rand Index (ARI), which ranges from −1 to 1, see Hubert & Arabie (1985) for detailed description of ARI. In this paper, we compute the average Adjusted Rand Index (AvARI) among the 100 replicates.

We report results over the 100 replicates in Table 2. We observe significant smaller MSEs for the estimated λ obtained by log-concave mixture models. Especially for Model I and II, the mean square errors obtained by log-concave mixture model are less than half of those obtained by normal mixture model. In terms of classification, the average misclassification number among the 500 observations are significantly reduced as well. The AvARI indicates that the classification results of the log-concave mixture model and the normal mixture model are quite different, especially for Model I and II.

Table 2.

Simulation results of Model I – IV.

Model	d	AMN₁	AMN₂	MSE₁	MSE₂	AvARI
I	1	17.56	10.86	0.0016	0.0007	0.91
II	2	30.35	13.28	0.0085	0.0013	0.86
III	3	12.43	4.79	0.0010	0.0005	0.93
IV	4	7.97	3.21	0.0006	0.0004	0.95

Open in a new tab

To compare the classification result for each replicate, we take d = 4 as an example and show the clustering results in Figure 1. In Figure (1a), each point represents a single replicate from Model IV’s setup. The x-axis represents the number of misclassification by the Normal mixture EM-algorithm. The y-axis represents the number of misclassification by our log-concave mixture EM-algorithm. We observe significant improvement in the misclassification rates, as all the points for our 100 replicates are below the identity line.

Four-dimensional clustering result: normal mixture EM-algorithm vs log-concave mixture EM-algorithm by number of misclassifications. The solid lines represent the identity.

To better illustrate the finite sample performance of the LCMLE, we pick one replicate from Model I. To compare the fitted densities with the true densities, in Figure 2, we plot the true component densities in the solid lines and the fitted densities in the dashed line. Even for a finite sample size of 500, the LCMLE for the log-concave mixture model approximates the true component densities well.

EM-type algorithm estimation for log-concave mixtures for a single replicate of Model I. Solid line represents the truth and dashed line represents the estimation results. The fitted λ̂ = 0.3076.

5.3. Insignificant penalty when the parametric assumptions are correct

We are also interested in the price that we have to pay for the flexibility while the data actually are from normal mixtures. For Model V – VIII, we generate n = 500 observations from a joint normal mixture distribution, in which the first component (with probability 0.4) is a d-dimensional normal distribution with mean 0_d and covariance matrix 0.5I_d + 0.51_d, and the second component (with probability 0.6) is a d-dimensional normal distribution with mean μ_d and the same covariance matrix, where μ₁ = 5, μ₂ = (3, 2)^T, μ₃ = (3, 2, 2)^T, and μ₄ = (3, 1, 3, 1)^T . We also repeat the simulation 100 times and compare the same criteria.

From Table 3, we observe no significant penalty for applying log-concave mixture models instead of normal mixture models. The MSEs and average misclassification numbers for log-concave mixture models are either almost the same or only a little bit higher than those for the multivariate normal mixture model. The classification results of the log-concave mixture model and the normal mixture model are quite similar to each other, as we observe the AvARI’s are close to 1 in Table 3. This phenomena is further supported in Figure (1b), which shows the classification results for Model VIII (d = 4). We observe no significant difference in terms of misclassifications, as most points in Figure (1b) are around the identity line. Consequently, we conclude that the log-concave mixture model is a more flexible methodology without significant penalties if the data are actually from normal mixtures.

Table 3.

Simulation results of Model V – VIII.

Model	d	AMN₁	AMN₂	MSE₁	MSE₂	AvARI
V	1	3.17	3.51	0.0004	0.0005	0.99
VI	2	33.95	37.12	0.0018	0.0020	0.91
VII	3	21.95	23.57	0.0008	0.0008	0.94
VIII	4	17.71	18.20	0.0018	0.0020	0.96

Open in a new tab

6. Real Data Application

To further illustrate the performance of log-concave mixture models, we apply the log-concave EM algorithm to the crab data set of Campbell & Mahon (1974), which contains two types of crabs in the data set: 100 blue crabs and 100 orange crabs. We focus on these blue crabs, which include n₁ = 50 males and n₂ = 50 females referred to as groups G₁ and G₂, respectively. For each crab, there are five measurements. We are only interested in two of them: RW (rear width) and BD (body depth), both in unit of mm. In Figure 3, we give the scatter plot of RW and BD.

Scatter plot of RW (rear width) and BD (body depth) of the Blue Crab data set.

Fitting a 2-dimensional two component log-concave mixture model results in 18 observations misclassified. Fifteen observations from G₁ are misclassified into G₂ and three observations from G₂ are misclassified into G₁. The normal mixture model results in 20 observations misclassified in total. Two additional observations from G₁ are misclassified into G₂.

7. Conclusion

The log-concave maximum likelihood estimator (LCMLE) provides more flexibility to estimate mixture densities, when compared to the traditional parametric mixture models. The estimation of LCMLE for log-concave mixtures can be achieved by an EM-type algorithm. The LCMLE is not sensitive to the model mis-specification and consequently, only one implementation of the EM-type algorithm is necessary. Through simulation studies, we observed significant improvements in the sense of classification and no significant penalties when the parametric assumption is indeed correct.

In this paper, we proved the existence of the LCMLE for log-concave mixture models. The consistency is also proved for the estimated mixture density. If the true mixture density is identifiable, then the estimated component densities are also identifiable. However, it is not an easy task to prove the overall identifiability for the most general family of mixtures of log-concave distributions in (1.2) from a nonparametric point of view. Some restrictive conditions, such as symmetry, are needed to ensure identifiability. Hunter et al. (2007) and Bordes et al. (2006a) proved the identifiability of (1.1) if K = 2 and both component densities are symmetric but with different location parameters. Balabdaoui & Doss (2014) has considered a special case of (1.2), when ϕ_j(x; θ_j) = ϕ(x − θ_j) and ϕ is a symmetric concave function about 0, and the identifiability of (1.2) follows from Hunter et al. (2007) and Bordes et al. (2006a) when K = 2.

Acknowledgments

We wish to thank the Associate Editor and two reviewers for their helpful comments and suggestions that led to improvements in this paper. Hu’s research is partially supported by National Institutes of Health grant R01-CA149569. Wu’s research is partially supported by National Institutes of Health grant R01-CA149569 and National Science Foundation grant DMS-1055210. Yao’s research is supported by NSF grant DMS-1461677.

Appendix A: Lemmas

Lemma 1 is taken from Cule & Samworth (2010). Lemma 2 to Lemma 5 are taken from DSS 2011. Lemma 6 is the extension of Lemma 2.13 of DSS 2011.

Lemma 1

For any log-concave distribution Q with density f, there exist finite constants B₁ = B₁(Q) > 0 and B₂ = B₂(Q) > 0 such that f(x) ≤ B₁ exp(−B₂||x||) for all x ∈ ℝ^d.

Lemma 2

The following properties of Q are equivalent:

csupp(Q) has non-empty interior.
Q(H) < 1 for any hyperplane H ⊂ ℝ^d.
With Leb denoting Lebesgue measure on ℝ^d,
$lim_{δ ↓ 0} sup {Q (A) : A \subset ℝ^{d} closed and convex, Leb (A) \leq δ} < 1.$

Lemma 3

Let ϕ be the function such that for any x, y ∈ interior(dom(ϕ)) and t ∈ (0, 1), if tx + (1 − t)y ∈ interior(dom(ϕ)), ϕ(tx + (1 − t)y) ≥ tϕ(x) + (1 − t)ϕ(y) and for C ⊆ ℝ^d, ∫_C e^ϕ⁽^x⁾dx ≤ 1. We define D_q = {x ∈ C : ϕ(x) ≥ q}. For any r < M ≤ max_x∈ℝ^d ϕ(x),

Leb (D_{r}) \leq {(M - r)}^{d} e^{- M} / \int_{0}^{M - r} t^{d} e^{- t} d t .

Lemma 4

Let ϕ̄, ϕ₁, ϕ₂, … be concave functions and ϕ_n ≤ ϕ̄. Further we assume the set H = {x : lim inf_n_→∞ ϕ_n(x) > −∞} is not empty. Then there exist a subsequence (ϕ_n₍_k₎)_k of (ϕ_n)_n and a function ϕ such tha $H \subset dom (ϕ) \overset{d}{=} {ϕ > - \infty}$ :

\begin{array}{l} lim_{n \to \infty, x \to y} ϕ_{n (k)} (x) = ϕ (y) for all y \notin interior (dom (ϕ)), \\ lim_{n \to \infty, x \to y} ϕ_{n (k)} (x) \leq ϕ (y) for all y \in ℝ^{d} . \end{array}

Lemma 5

Suppose Q_n is a sequence converged to some distribution Q and h be a nonnegative and continuous function, then

\underset{n \to \infty}{lim inf} \int {hdQ}_{n} \geq \int hdQ .

If the stronger statement lim inf_n_→∞ ∫ hdQ_n = ∫ hdQ < ∞ holds, then for any function f such that |f|/(1 + h) is bounded,

lim_{n \to \infty} \int {fdQ}_{n} = \int fdQ .

Lemma 6

A point x ∈ ℝ^d is an interior point of C if and only if h(Q, x) = sup{Q(E): E ⊂ C, E closed and convex, x ∉ interior(E)}/Q(C) < 1.

Proof

For x ∉ interior(E) and closed and convex E, there exits a unit vector u_j ∈ R^d such that E is contained in the closed set H_C which is a subset of C:

C \supseteq H_{C} (x) = {y \in C : u^{T} y \leq u^{T} x} \supseteq E .

By the definition of h(Q, x) we conclude h(Q, x) ≤ Q(H_C)/Q(C) ≤ 1. There are two cases: E ⊂ H_C and E = H_C(x). For the case E ⊂ H_C, by definition h(Q, x) < 1 strictly. For the case E = H_C(x), as we have x ∉ interior(E) but x ∈ H_C(x), we conclude x ∈ ∂H_C(x). Now if x ∉ interior(C), by definition, h(Q, x) = 1. On the other hand, if h(Q, x) = 1, then Q(H_C(x)) = Q(C), which leads to C = H_C(x) = E. Combined with x ∉ interior(H_C(x)) we can conclude that x ∉ interior(C). Consequently, x ∉ interior(C) ⇔ h(Q, x) = 1. Thus, x ∈ interior(C) ⇔ h(Q, x) < 1.

Appendix B: Proof of Theorem 1

The first thing is to prove the finiteness of the log-likelihood type function.

L(Q) is the supreme of L(ϕ, λ, π, Q) over all ϕ ∈ Φ, λ ∈ Λ, λ ∈ ℝ^K. If we take a special case that $ϕ_{j}^{*} (x) = - (log λ_{j}^{*}) - ‖ x ‖$ , L(ϕ^*, λ^*, π, Q) = log K − ∫ ||x||dQ > −∞. Consequently, L(Q) > −∞.

Now we show L(Q) < ∞. As discussed at the end of Section 2, we do restrict our interest to the ϕ such that ∫e^ϕ_j(x)dx = 1 for j = 1, …, K. Consequently, we define the log-density as $l (x) = log \sum_{j = 1}^{K} λ_{j} e^{ϕ_{j} (x)}$ and rewrite the log-likelihood-type function as L(l, Q) = L(ϕ, λ, π, Q). For the convenience of the proof, we define an envelope function ϕ̄(x) = max_j{ϕ_j(x)}, i.e. ϕ̄(x) ≥ l(x) for every x ∈ ℝ^d. This function is continuous but not smooth on d − 1 dimensional boundaries. These boundaries divide the csupp(Q) into K sets: C₁, …, C_K. Each set C_j is defined as C_j = {x ∈ ℝ^d : ϕ̄(x) = ϕ_j(x)}. The sets C₁, …, C_K are disjoint except on the boundaries and Leb(C_i ∩ C_j) = 0 for every i ≠ j. For any x, y ∈ C_j and t ∈ (0, 1), ϕ̄(tx + (1 − t)y) ≥ tϕ̄(x) + (1 − t) ϕ̄(y) and ∫_{C_j} e^ϕ̄⁽^x⁾dx ≤ 1. We define M_j(ϕ) and 𝒮(ϕ) as stated in Section 1. As $L (l, Q) \leq \sum_{j = 1}^{K} Q (C_{j}) M_{j}$ , M_j > −∞, and the restriction |𝒮(ϕ)| ≥ η > 0, we focus our interest on M_j > 0 and the only case which we have to worry about is all M_j’s increasing to infinity. We define D_q = {x ∈ ℝ^d : ϕ̄(x) ≥ q}. For any c > 0,

\begin{array}{l} L (l, Q) \leq \int \bar{ϕ} (x) d Q = \int_{csupp (Q) \ D_{- c M_{(1)}}} \bar{ϕ} (x) d Q + \int_{D_{- c M_{(1)}}} \bar{ϕ} (x) d Q \\ \leq - c M_{(1)} (1 - Q (D_{- c M_{(1)}})) + M_{(K)} Q (D_{- c M_{(1)}}) \\ \leq (1 + c η) (Q (D_{- c M_{(1)}}) - \frac{c η}{c η + 1}) M_{(K)} . \end{array}

We can always find sufficient large c such that the set D_{−cM₍₁₎} is a closed and convex subset of ℝ^d. We define the set D_j_,_q = {x ∈ C_j : ϕ̄(x) ≥ q} ⊂ C_j. Obviously $Leb (D_{- c M_{(1)}}) = \sum_{j = 1}^{K} Leb (D_{j, - c M_{(1)}})$ . For any c > 0, applying Lemma 3 to set D_{j,−cM₍₁₎} and letting M = M₍₁₎ yield $Leb (D_{j, - c M_{(1)}}) \leq (1 + c) M_{(1)}^{d} e^{- M_{(1)}} / (d! + o (1)) \to 0$ as M₍₁₎ → ∞ for every j = 1, …, K. Consequently, Leb(D_{−cM₍₁₎}) → 0 as M₍₁₎ → ∞. By our definition, η ∈ (0, 1]. Thus, by Lemma 2, we can find sufficiently large c and small δ such that

sup {Q (D) : D \subset ℝ^{d}, Leb (D) \leq δ} < \frac{c η}{c η + 1} .

Thus, L(l, Q) → −∞ as M₍₁₎ → ∞, which indicates that when all modes of log-concave densities increase to infinity, the log-likelihood is poorly characterized. On the other hand, L(l, Q) ≤ M₍_K₎. These considerations show that L(Q) is finite and equals the supremum of L(l, Q) for suitable finite M_j’s such that $M_{j} \in [M_{* j}, M_{j}^{*}] (j = 1, \dots, K)$ .

Let ϕ_m_,_j’s and λ_m_,_j’s form a sequence l_m(x) = logΣλ_m_,_j exp{ϕ_m_,_j(x)} such that −∞ < L(l_m, Q) ↑ L(Q) as m → ∞. Next, we will prove that for every j ∈ {1, …, K}, there exists a point, say, x_0,_j ∈ interior(csupp(Q)), such that lim inf_m_→∞ ϕ_m_,_j(x_0,_j) > −∞.

We define ϕ̄_m(x) = max_j{ϕ_m_,_j(x)}, C_m_,_j = {x ∈ ℝ^d : ϕ̄_m(x) = ϕ_m_,_j(x)}, and M_m_,_j = max_x∈ℝ^d ϕ_m_,_j(x). For any j^* ∈ {1, …, K}, by picking any x_0,j^* ∈ C_m,j^* such that $ϕ_{m, j *} (x_{0, j *}) \in [M_{m, j *}^{'}, M_{m, j *})$ , where $M_{m, j *}^{'} = {max}_{x \in \partial {C_{m, j *}}} ϕ_{m, j *} (x)$ , there exists a sufficient small ε ≥ 0 such that the set E_m,j^* = {x ∈ C_m,j^*: ϕ_m,j^*(x) ≥ ϕ_m,j^* (x_0,j^*) + ε} is a closed and convex subset of C_m,j^* and x_0,j^* is not an interior point of E_m,j^*. Thus,

\begin{array}{l} L (l_{m}, Q) = \int l_{m} d Q \leq \int {\bar{ϕ}}_{m} (x) d Q = \sum_{j \neq j *} \int_{C_{m, j}} ϕ_{m, j} (x) d Q + \int_{C_{m, j *}} ϕ_{m, j *} d Q \\ \leq \sum_{j \neq j *} M_{m, j} Q (C_{m, j}) + ϕ_{m, j *} (x_{0, j *}) Q (C_{m, j *}) + (M_{m, j *} - ϕ_{m, j *} (x_{0, j *})) Q (E_{m, j *}) \\ \leq \sum_{j = 1}^{K} max (M_{m, j}, 0) + ϕ_{m, j *} (x_{0, j *}) Q (C_{m, j *}) (1 - h_{j *} (Q, x_{0, j *})) . \end{array}

These inequalities hold for the case of ϕ_m,j^* (x_0,j^*) = M_m,j^* as well (ε = 0 accordingly). By Lemma 6, h_j^* (Q, x_0,j^*) < 1. Due to the fact that M_m,j^* is finite, interior(C_m,j^*) is not empty. Consequently, lim inf_m_→∞ Q(C_m,j^*) > 0, which yields

\begin{array}{l} ϕ_{m, j *} (x_{0, j *}) \leq - \frac{\sum_{j = 1}^{K} max (M_{m, j}, 0) - L (l_{m}, Q)}{Q (C_{m, j *}) (1 - h_{j *} (Q, x_{0, j *}))} \\ > - \frac{\sum_{j = 1}^{K} max (M_{j}^{*}, 0) - L (l_{1}, Q)}{Q (C_{m, j *}) (1 - h_{j *} (Q, x_{0, j *}))} > - \infty . \end{array}

Hence, the set H_j = {x : lim inf _m_→∞ ϕ_m_,_j(x) > −∞} is not empty for every j ∈ {1, …, K}. From Lemma 1 we conclude that for each ϕ_j, we can find suitable finite positive constants a_j, b_j > 0 such that ϕ_j(x) ≤ a_j − b_j||x|| ≤ a − b||x||, where a = max_j a_j > 0 and b = min_j b_j > 0. Then by Lemma 4, there exist a subsequence (ϕ_1,m(k₁))_k₁ of (ϕ_1,_m)_m and a concave function ϕ₁ such that:

\begin{matrix} lim_{k_{1} \to \infty, x \to y} ϕ_{1, m (k_{1})} (x) = ϕ_{1} (y) for all y \in interior (dom (ϕ_{1})), \\ lim_{k_{1} \to \infty, x \to y} ϕ_{1, m (k_{1})} (x) \leq ϕ_{1} (y) for all y \in ℝ^{d} . \end{matrix}

If we define ϕ₁ = −∞ on ℝ^d \ dom(ϕ₁), then we can rewrite them as:

\begin{array}{l} \underset{k_{1} \to \infty}{lim sup} ϕ_{1, m (k_{1})} (x) \leq ϕ_{1} (x) for all x \in \partial {dom (ϕ_{1})}, \\ lim_{k_{1} \to \infty} ϕ_{1, m (k_{1})} (x) = ϕ_{1} (x) for all x \in ℝ^{d} \ \partial {dom (ϕ_{1})} . \end{array}

We can find a sub-subsequence in the original subsequence, which has the similar property for ϕ_2,m(k₂). Keeping doing this sequentially for all ϕ_m_,_j’s and λ_m_,_j’s will yield the common subsequence l_m₍_k₎ and a function l^*(x) = logΣλ_j exp{ϕ_j(x)} such that:

\begin{array}{l} \underset{k \to \infty}{lim sup} l_{m (k)} (x) \leq l^{*} (x) for all x \in P, \\ lim_{k \to \infty} l_{m (k)} (x) = l^{*} (x) for all x \in ℝ^{d} \ P, \end{array}

where $P = \cup_{j = 1}^{K} (\partial {dom (ϕ_{j})})$ and Leb(℘) = 0. The next step is to prove that l^*(x) is the maximizer. Applying Fatou’s lemma to the subsequence function l_m₍_k₎(x) ≤ a − b||x|| yields

\underset{k \to \infty}{lim sup} \int l_{m (k)} d Q \leq \int l^{*} d Q .

Hence,

L (Q) \geq l (l^{*}, Q) \geq \underset{k \to \infty}{lim sup} L (l_{m (k)}, Q) = L (Q),

from which we conclude L(l^*, Q) = L(Q). The first inequality follows by the definition of L(Q). The last equality follows by the definition that l_m₍_k₎ is a sequence that maximizes L(l_m₍_k₎, Q) to L(Q) as k → ∞. Thus, it concludes the existence of the maximizer l^*, which indicates the existence of $λ_{j}^{*}$ ’s and $ϕ_{j}^{*}$ ’s.

Appendix C: Proof of Theorem 2

We proof the theorem for a subsequence of Q_n. Let L(Q_n) → Γ. As in the proof of Theorem 1, l_n(x) ≤ a − b||x|| and inf ϕ_n_,_j(x₀) > −∞ for some x₀ ∈ interior(csupp(Q)). Therefore, for a subsequence of (Q_n)_n, there exists a function l^* such that l_n(y), l^*(y) ≤ a − b||y||, and

\begin{array}{l} \underset{k \to \infty}{lim sup} l_{n (k)} (x) \leq l^{*} (x) for all x \in P, \\ lim_{k \to \infty} l_{n (k)} (x) = l^{*} (x) for all x \in ℝ^{d} \ P . \end{array}

By Skorohod’s theorem, there exists a probability space with random variables X_n ~ Q_n, X ~ Q such that X_n → X almost surely. We define a random variable H_n = a − b||X_n|| − l_n(X_n) ≥ 0. Applying Fatou’s lemma to H_n yields,

\begin{array}{l} Γ = lim_{n \to \infty} \int l_{n} d Q_{n} = lim_{n \to \infty} \int (a - b ‖ x ‖) d Q_{n} - E (H_{n}) = a - b γ - \underset{n \to \infty}{lim inf} E (H_{n}) \\ \leq a - b γ - E (\underset{n \to \infty}{lim inf} (H_{n})) \leq a - b γ - E (a - b ‖ X ‖ - l^{*} (X)) \\ = b (\int ‖ x ‖ d Q_{0} - γ) + \int l^{*} (X) d Q_{0} = L (l^{*}, Q_{0}) \leq L (Q_{0}) . \end{array}

Let l₀(x) = logΣλ_jϕ_j(x), i.e. λ_j’s and ϕ_j’s are the results corresponding with l₀. In the following proof we utilize a special approximation scheme. Let $l^{(ε)} (x) = log \sum λ_{j}^{(ε)} ϕ_{j}^{(ε)} (x), λ_{j}^{(ε)} = λ_{j}$ and $ϕ_{j}^{(ε)} = {inf}_{v, c} (v_{j}^{T} x + c_{j})$ such that ||v_j|| ≤ ε⁻¹ and $ϕ_{j} (y) \leq v_{j}^{T} y + c_{j}$ . DSS 2011 shows that the approximation $ϕ_{j}^{(ε)}$ is real valued and Lipschitz continuous with constant ε⁻¹. Consequently, l⁽^ε⁾(x) is also Lipschitz-continuous with constant ε⁻¹. Moreover, $ϕ_{j}^{(ε)} \geq ϕ_{j}$ and $ϕ_{j}^{(ε)} ↓ ϕ_{j}$ pointwise as ε ↓ 0. Thus, l⁽^ε⁾ ↓ l₀ pointwise as ε ↓ 0 and l⁽¹⁾ ≥ l⁽^ε⁾ ≥ l₀ for ε ∈ (0, 1). With this approximation, it follows from Lipschitz-continuity, ∫ ||x||dQ₀ = γ < ∞, and the stronger version of Lemma 5 that

\begin{array}{l} Γ = lim_{n \to \infty} \int l_{n} d Q_{n} \geq lim_{n \to \infty} L (l^{(ε)}, Q_{n}) = lim_{n \to \infty} \int l^{(ε)} d Q_{n} - \sum π_{j} \int e^{ϕ_{j}^{(ε)} (x)} d x + 1 \\ = \int l^{(ε)} d Q_{0} - \sum π_{j} \int exp (ϕ_{j}^{(ε)} (x)) d x + 1. \end{array}

Applying monotone convergence theorem to function l⁽¹⁾ − l⁽^ε⁾ and dominated convergence theorem to $exp {ϕ_{j}^{(ε)}}$ ’s yields, lim_ε_↓0+ L(l⁽^ε⁾, Q₀) = L(l₀, Q₀). Hence, Γ ≥ L(Q₀). Combining with Γ ≤ L(l^*, Q₀) ≤ L(Q₀) yields Γ = L(Q₀) = L(l^*, Q₀), which indicates that l^* equals the maximizer l₀ = l(·|Q₀) that corresponds to L(Q₀).

Applying to density f_n = exp{l_n} and f₀ = exp{l₀} yields,

\begin{array}{l} lim_{n \to \infty, x \to y} f_{n} (x) = f_{0} (y) for all x \in ℝ^{d} \ P, \\ lim_{n \to \infty, x \to y} f_{n} (x) \leq f_{0} (y) for all x \in P, \end{array}

where $P = \cup_{j = 1}^{K} (\partial {f_{0 j} > 0})$ and Leb(℘) = 0. Consequently, (f_n)_n → f₀ almost everywhere with respect to Lebesgue measure. In addition, |f_n(x)| ≤ e^a⁻^b^||^x^||, and ∫ e^a⁻^b^||^x^||dx is finite. Applying Lebesgue’s dominated convergence theorem yields,

lim_{n \to \infty} \int ∣ f_{n} (x) - f_{0} (x) ∣ d x = 0.

Consequently, we claim Theorem 2 to be true for a subsequence of the original sequence (Q_n)_n. It remains to show it is true for the entire sequence.

Suppose any assertion about f_n is false, then one could replace the initial sequence (Q_n)_n from the start with a subsequence such that one of the following three conditions is satisfied:

lim_n_→∞ f_n(x_n) > f₀(y) for some sequence (x_n)_n converge to point y;
lim_n_→∞ f_n(x_n) < f₀(y) for some sequence (x_n)_n converge to point y;
lim_n_→∞ ∫ |f_n(x) − f₀(x)|dx > 0.

Any of these three properties are transmitted to subsequence of (Q_n)_n, which would lead to a contradiction.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Contributor Information

Hao Hu, Email: hhu5@ncsu.edu.

Yichao Wu, Email: wu@stat.ncsu.edu.

Weixin Yao, Email: weixin.yao@ucr.edu.

References

Balabdaoui Fadoua, Doss Charles R. Inference for a mixture of symmetric distributions under log-concavity. 2014 arXiv preprint arXiv:1411.4708. [Google Scholar]
Balabdaoui Fadoua, Rufibach Kaspar, Wellner Jon A. Limit distribution theory for maximum likelihood estimation of a log-concave density. Annals of statistics. 2009;37(3):1299. doi: 10.1214/08-AOS609. [DOI] [PMC free article] [PubMed] [Google Scholar]
Benaglia Tatiana, Chauveau Didier, Hunter David, Young Derek. mixtools: An R package for analyzing finite mixture models. Journal of Statistical Software. 2009;32(6):1–29. [Google Scholar]
Bordes Laurent, Vandekerkhove Pierre. Semiparametric two-component mixture model with a known component: an asymptotically normal estimator. Mathematical Methods of Statistics. 2010;19(1):22–41. [Google Scholar]
Bordes Laurent, Mottelet Stéphane, Vandekerkhove Pierre, et al. Semiparametric estimation of a two-component mixture model. The Annals of Statistics. 2006a;34(3):1204–1232. [Google Scholar]
Bordes Laurent, Delmas Céline, Vandekerkhove Pierre. Semiparametric Estimation of a Two-component Mixture Model where One Component is known. Scandinavian journal of statistics. 2006b;33(4):733–752. [Google Scholar]
Butucea Cristina, Vandekerkhove Pierre. Semiparametric mixtures of symmetric distributions. Scandinavian Journal of Statistics. 2014;41(1):227–239. [Google Scholar]
Campbell NA, Mahon RJ. A multivariate study of variation in two species of rock crab of the genus Leptograpsus. Australian Journal of Zoology. 1974;22(3):417–425. [Google Scholar]
Chang George T, Walther Guenther. Clustering with mixtures of log-concave distributions. Computational Statistics & Data Analysis. 2007;51(12):6242–6251. [Google Scholar]
Chee Chew-Seng, Wang Yong. Estimation of finite mixtures with symmetric components. Statistics and Computing. 2013;23(2):233–249. [Google Scholar]
Chen Jiahua, Tan Xianming, Zhang Runchu. Inference for normal mixtures in mean and variance. Statistica Sinica. 2008;18(2):443. [Google Scholar]
Chen Yining, Samworth Richard J. Smoothed log-concave maximum likelihood estimation with applications. Statist Sinica. 2013;23:1373–1398. [Google Scholar]
Cule Madeleine, Samworth Richard. Theoretical properties of the log-concave maximum likelihood estimator of a multidimensional density. Electronic Journal of Statistics. 2010;4:254–270. [Google Scholar]
Cule Madeleine, Gramacy Robert, Samworth Richard. LogConcDEAD: An R package for maximum likelihood estimation of a multivariate log-concave density. Journal of Statistical Software. 2009;29(2):1–20. [Google Scholar]
Cule Madeleine, Samworth Richard, Stewart Michael. Maximum likelihood estimation of a multi-dimensional log-concave density. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2010;72(5):545–607. [Google Scholar]
Dempster Arthur P, Laird Nan M, Rubin Donald B. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 1977:1–38. [Google Scholar]
Doss Charles, Wellner Jon A. Global rates of convergence of the MLEs of log-concave and s-concave densities. 2013 doi: 10.1214/15-AOS1394. arXiv preprint arXiv:1306.1438. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dümbgen Lutz, Rufibach Kaspar. Maximum likelihood estimation of a log-concave density and its distribution function: Basic properties and uniform consistency. Bernoulli. 2009;15(1):40–68. [Google Scholar]
Dümbgen Lutz, Samworth Richard, Schuhmacher Dominic. Approximation by log-concave distributions, with applications to regression. The Annals of Statistics. 2011;39(2):702–730. [Google Scholar]
Hathaway Richard J. A constrained formulation of maximum-likelihood estimation for normal mixture distributions. The Annals of Statistics. 1985:795–800. [Google Scholar]
Hohmann Daniel, Holzmann Hajo. Semiparametric location mixtures with distinct components. Statistics. 2013;47(2):348–362. [Google Scholar]
Hubert Lawrence, Arabie Phipps. Comparing partitions. Journal of classification. 1985;2(1):193–218. [Google Scholar]
Hunter David R, Wang Shaoli, Hettmansperger Thomas P. Inference for mixtures of symmetric distributions. The Annals of Statistics. 2007:224–251. [Google Scholar]
Kim Arlene KH, Samworth Richard J. Global rates of convergence in log-concave density estimation. 2014 arXiv preprint arXiv:1404.2298. [Google Scholar]
Ma Yanyuan, Yao Weixin. Flexible estimation of a semiparametric two-component mixture model with one parametric component. Electronic Journal of Statistics. 2015;9:444–474. [Google Scholar]
McLachlan Geoffrey, Krishnan Thriyambakam. The EM algorithm and extensions. Vol. 382. John Wiley & Sons; 2007. [Google Scholar]
McLachlan Geoffrey, Peel David. Finite mixture models. John Wiley & Sons; 2000. [Google Scholar]
Mcnicholas Paul David, Murphy Thomas Brendan. Parsimonious Gaussian mixture models. Statistics and Computing. 2008;18(3):285–296. [Google Scholar]
Pal Jayanta Kumar, Woodroofe Michael, Meyer Mary. Estimating a Polya Frequency Function. Lecture Notes-Monograph Series. 2007:239–249. [Google Scholar]
Rufibach Kaspar. Computing maximum likelihood estimators of a log-concave density function. Journal of Statistical Computation and Simulation. 2007;77(7):561–574. [Google Scholar]
Stephens Matthew. Dealing with label switching in mixture models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2000;62(4):795–809. [Google Scholar]
Walther Guenther. Detecting the presence of mixing with multiscale maximum likelihood. Journal of the American Statistical Association. 2002;97(458):508–513. [Google Scholar]
Xiang Sijia, Yao Weixin, Wu Jingjing. Minimum profile Hellinger distance estimation for a semiparametric mixture model. Canadian Journal of Statistics. 2014;42(2):246–267. [Google Scholar]
Yao Weixin. A profile likelihood method for normal mixture with unequal variance. Journal of Statistical Planning and Inference. 2010;140(7):2089–2098. [Google Scholar]
Yao Weixin. Label switching and its solutions for frequentist mixture models. Journal of Statistical Computation and Simulation. 2015;85(5):1000–1012. [Google Scholar]
Yao Weixin, Lindsay Bruce G. Bayesian mixture labeling by highest posterior density. Journal of the American Statistical Association 2012 [Google Scholar]

[R1] Balabdaoui Fadoua, Doss Charles R. Inference for a mixture of symmetric distributions under log-concavity. 2014 arXiv preprint arXiv:1411.4708. [Google Scholar]

[R2] Balabdaoui Fadoua, Rufibach Kaspar, Wellner Jon A. Limit distribution theory for maximum likelihood estimation of a log-concave density. Annals of statistics. 2009;37(3):1299. doi: 10.1214/08-AOS609. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Benaglia Tatiana, Chauveau Didier, Hunter David, Young Derek. mixtools: An R package for analyzing finite mixture models. Journal of Statistical Software. 2009;32(6):1–29. [Google Scholar]

[R4] Bordes Laurent, Vandekerkhove Pierre. Semiparametric two-component mixture model with a known component: an asymptotically normal estimator. Mathematical Methods of Statistics. 2010;19(1):22–41. [Google Scholar]

[R5] Bordes Laurent, Mottelet Stéphane, Vandekerkhove Pierre, et al. Semiparametric estimation of a two-component mixture model. The Annals of Statistics. 2006a;34(3):1204–1232. [Google Scholar]

[R6] Bordes Laurent, Delmas Céline, Vandekerkhove Pierre. Semiparametric Estimation of a Two-component Mixture Model where One Component is known. Scandinavian journal of statistics. 2006b;33(4):733–752. [Google Scholar]

[R7] Butucea Cristina, Vandekerkhove Pierre. Semiparametric mixtures of symmetric distributions. Scandinavian Journal of Statistics. 2014;41(1):227–239. [Google Scholar]

[R8] Campbell NA, Mahon RJ. A multivariate study of variation in two species of rock crab of the genus Leptograpsus. Australian Journal of Zoology. 1974;22(3):417–425. [Google Scholar]

[R9] Chang George T, Walther Guenther. Clustering with mixtures of log-concave distributions. Computational Statistics & Data Analysis. 2007;51(12):6242–6251. [Google Scholar]

[R10] Chee Chew-Seng, Wang Yong. Estimation of finite mixtures with symmetric components. Statistics and Computing. 2013;23(2):233–249. [Google Scholar]

[R11] Chen Jiahua, Tan Xianming, Zhang Runchu. Inference for normal mixtures in mean and variance. Statistica Sinica. 2008;18(2):443. [Google Scholar]

[R12] Chen Yining, Samworth Richard J. Smoothed log-concave maximum likelihood estimation with applications. Statist Sinica. 2013;23:1373–1398. [Google Scholar]

[R13] Cule Madeleine, Samworth Richard. Theoretical properties of the log-concave maximum likelihood estimator of a multidimensional density. Electronic Journal of Statistics. 2010;4:254–270. [Google Scholar]

[R14] Cule Madeleine, Gramacy Robert, Samworth Richard. LogConcDEAD: An R package for maximum likelihood estimation of a multivariate log-concave density. Journal of Statistical Software. 2009;29(2):1–20. [Google Scholar]

[R15] Cule Madeleine, Samworth Richard, Stewart Michael. Maximum likelihood estimation of a multi-dimensional log-concave density. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2010;72(5):545–607. [Google Scholar]

[R16] Dempster Arthur P, Laird Nan M, Rubin Donald B. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 1977:1–38. [Google Scholar]

[R17] Doss Charles, Wellner Jon A. Global rates of convergence of the MLEs of log-concave and s-concave densities. 2013 doi: 10.1214/15-AOS1394. arXiv preprint arXiv:1306.1438. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] Dümbgen Lutz, Rufibach Kaspar. Maximum likelihood estimation of a log-concave density and its distribution function: Basic properties and uniform consistency. Bernoulli. 2009;15(1):40–68. [Google Scholar]

[R19] Dümbgen Lutz, Samworth Richard, Schuhmacher Dominic. Approximation by log-concave distributions, with applications to regression. The Annals of Statistics. 2011;39(2):702–730. [Google Scholar]

[R20] Hathaway Richard J. A constrained formulation of maximum-likelihood estimation for normal mixture distributions. The Annals of Statistics. 1985:795–800. [Google Scholar]

[R21] Hohmann Daniel, Holzmann Hajo. Semiparametric location mixtures with distinct components. Statistics. 2013;47(2):348–362. [Google Scholar]

[R22] Hubert Lawrence, Arabie Phipps. Comparing partitions. Journal of classification. 1985;2(1):193–218. [Google Scholar]

[R23] Hunter David R, Wang Shaoli, Hettmansperger Thomas P. Inference for mixtures of symmetric distributions. The Annals of Statistics. 2007:224–251. [Google Scholar]

[R24] Kim Arlene KH, Samworth Richard J. Global rates of convergence in log-concave density estimation. 2014 arXiv preprint arXiv:1404.2298. [Google Scholar]

[R25] Ma Yanyuan, Yao Weixin. Flexible estimation of a semiparametric two-component mixture model with one parametric component. Electronic Journal of Statistics. 2015;9:444–474. [Google Scholar]

[R26] McLachlan Geoffrey, Krishnan Thriyambakam. The EM algorithm and extensions. Vol. 382. John Wiley & Sons; 2007. [Google Scholar]

[R27] McLachlan Geoffrey, Peel David. Finite mixture models. John Wiley & Sons; 2000. [Google Scholar]

[R28] Mcnicholas Paul David, Murphy Thomas Brendan. Parsimonious Gaussian mixture models. Statistics and Computing. 2008;18(3):285–296. [Google Scholar]

[R29] Pal Jayanta Kumar, Woodroofe Michael, Meyer Mary. Estimating a Polya Frequency Function. Lecture Notes-Monograph Series. 2007:239–249. [Google Scholar]

[R30] Rufibach Kaspar. Computing maximum likelihood estimators of a log-concave density function. Journal of Statistical Computation and Simulation. 2007;77(7):561–574. [Google Scholar]

[R31] Stephens Matthew. Dealing with label switching in mixture models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2000;62(4):795–809. [Google Scholar]

[R32] Walther Guenther. Detecting the presence of mixing with multiscale maximum likelihood. Journal of the American Statistical Association. 2002;97(458):508–513. [Google Scholar]

[R33] Xiang Sijia, Yao Weixin, Wu Jingjing. Minimum profile Hellinger distance estimation for a semiparametric mixture model. Canadian Journal of Statistics. 2014;42(2):246–267. [Google Scholar]

[R34] Yao Weixin. A profile likelihood method for normal mixture with unequal variance. Journal of Statistical Planning and Inference. 2010;140(7):2089–2098. [Google Scholar]

[R35] Yao Weixin. Label switching and its solutions for frequentist mixture models. Journal of Statistical Computation and Simulation. 2015;85(5):1000–1012. [Google Scholar]

[R36] Yao Weixin, Lindsay Bruce G. Bayesian mixture labeling by highest posterior density. Journal of the American Statistical Association 2012 [Google Scholar]

PERMALINK

Maximum likelihood estimation of the mixture of log-concave densities

Hao Hu

Yichao Wu

Weixin Yao

Abstract

1. Introduction

2. Log-concave maximum likelihood estimator

3. Theoretical Properties

Definition

Theorem 1

Theorem 2

4. EM-type algorithm

5. Simulation Results

5.1. Copula procedure to generate multivariate log-concave mixtures

5.2. Significant Improvement when densities are log-concave mixtures

Table 1.

Table 2.

Figure 1.

Figure 2.

5.3. Insignificant penalty when the parametric assumptions are correct

Table 3.

6. Real Data Application

Figure 3.

7. Conclusion

Acknowledgments

Appendix A: Lemmas

Lemma 1

Lemma 2

Lemma 3

Lemma 4

Lemma 5

Lemma 6

Proof

Appendix B: Proof of Theorem 1

Appendix C: Proof of Theorem 2

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases