Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Sep 1.
Published in final edited form as: Comput Stat Data Anal. 2016 Sep;101:137–147. doi: 10.1016/j.csda.2016.03.002

Maximum likelihood estimation of the mixture of log-concave densities

Hao Hu a,*, Yichao Wu a, Weixin Yao b
PMCID: PMC4820769  NIHMSID: NIHMS767573  PMID: 27065505

Abstract

Finite mixture models are useful tools and can be estimated via the EM algorithm. A main drawback is the strong parametric assumption about the component densities. In this paper, a much more flexible mixture model is considered, which assumes each component density to be log-concave. Under fairly general conditions, the log-concave maximum likelihood estimator (LCMLE) exists and is consistent. Numeric examples are also made to demonstrate that the LCMLE improves the clustering results while comparing with the traditional MLE for parametric mixture models.

Keywords: Consistency, Log-concave maximum likelihood estimator (LCMLE), Mixture model

1. Introduction

The finite mixture model (McLachlan & Peel, 2000; Mcnicholas & Murphy, 2008) provides a flexible methodology for both theoretical and practical analysis. It has a density of the form

f(x)=j=1Kλjgj(x;θj)xd, (1.1)

where λ1, …, λK are the mixing proportions and gj(x; θj)’s are component densities. The unknown parameters in the mixture model (1.1) can be estimated by the EM algorithm, see e.g. Dempster et al. (1977) and McLachlan & Krishnan (2007). One major drawback of the traditional mixture model (1.1) is the strong parametric assumption about the component density gj . It is often too restrictive and the density estimation may be inaccurate due to the model misspecification. Another drawback is that each model requires a specific EM algorithm based on the parametric assumption.

To relax the parametric assumption, nonparametric shape constraints are becoming increasingly popular. In this paper, we make one fairly general shape constraint for our mixture model. We assume that each component density is log-concave. A density g is log-concave if log g is concave. Examples of log-concave densities include normal, Laplace, logistic, as well as gamma and beta with certain parameter constraints. Logconcave densities have lots of nice properties as described by Balabdaoui et al. (2009). Their nonparametric maximum likelihood estimators were studied by Dümbgen & Rufibach (2009), Cule et al. (2010), Cule & Samworth (2010), Chen & Samworth (2013), Pal et al. (2007) and Dümbgen et al. (2011) (referred as [DSS 2011] thereafter). The convergence rates of these estimators for log-concave densities were studied by Doss & Wellner (2013) and Kim & Samworth (2014). Such estimators provide more generality and flexibility without any tuning parameters.

In our model, we assume that x1, …, xn are independent d-dimensional random variables with distribution Q0 and the mixture density f0. The mixture density f0 belongs to a given class

F={f:f(x)=j=1Kfj(x)=j=1Kλjexp{ϕj(x)},λΛ,ϕΦ}, (1.2)

where λ = (λ1, …, λK), Λ={(λ1,,λK):0<λj<1,j=1Kλj=1}, ϕ = (ϕ1, …, ϕK), and Φ = {(ϕ1, …, ϕK) : ϕj is concave}. We assume that each ϕj is continuous and is coercive in the sense that ϕj(x) → −∞ as ||x|| → ∞ (j = 1, …, K).

One issue for mixture models is that the likelihood might be unbounded in some cases. For example, the likelihood function for a normal mixture takes the form of L(θx)=i=1n(λg(xiμ1,σ12)+(1-λ)g(xiμ2,σ22)), where θ={(λ,μ1,μ2,σ12,σ12):σ12,σ12>0,λ(0,1)} and g is the density function for the standard normal distribution. When μ1 = x1 and σ120, L(θ|x) → ∞ (see Section 3.10 of McLachlan & Peel (2000) for detailed discussions). Many methods have been proposed to solve the unboundedness issue of mixture likelihood, see for example, Hathaway (1985), Chen et al. (2008), and Yao (2010). Note that, similar to traditional normal mixture models with unequal variances, the likelihood functions for mixtures of log-concave densities are unbounded as well. Thus, similar to Hathaway (1985), we will define LCMLE on a constrained parameter space. Let Mj(ϕ) = maxx∈ℝd{ϕj(x)}, M(1)(ϕ) = minj{Mj(ϕ)}, and M(K)(ϕ) = maxj{Mj(ϕ)}. We further define the ratio 𝒮(ϕ) = M(1)(ϕ)/M(K)(ϕ). Here, we borrow the idea of Hathaway (1985) by restricting our interest to a constrained subspace Φη such that Φη = {ϕΦ : |𝒮(ϕ)| ≥ η > 0} for some η ∈ (0, 1]. This restriction avoids estimating the case that the modes of different components differ a lot. By restricting on Φη, we focus our interest on f ∈ ℱη, where

Fη={f:f(x)=j=1Kfj(x)=j=1Kλjexp{ϕj(x)},λΛ,ϕΦη}. (1.3)

Let Qn be the empirical distribution of X1, …, Xn. The (restricted) log-concave maximum likelihood estimator (LCMLE) is

fn=f(·Qn)=argmaxfFηlog(f)dQn. (1.4)

In practice, similar to Hathaway (1985), picking η can be tricky for some extreme case. If η is too small, there might be a chance that some boundary point |𝒮(ϕ)| = η maximizes the log-likelihood and the solution will depend on the choice of η. In this paper, we do not focus on the issue of choosing η. The constrained subspace Φη is mainly used for theoretical development. Based on our empirical experience, if we start the algorithm from a reasonable initial value, such as the maximum likelihood estimate assuming all components are normal with equal variance, the unboundedness issue is very rare.

Many methods have been proposed to relax the parametric assumption of (1.1). Hunter et al. (2007), Bordes et al. (2006a), Butucea & Vandekerkhove (2014), and Chee & Wang (2013) considered the extension of (1.1) by assuming all component densities are symmetric but unknown. Bordes et al. (2006b), Bordes & Vandekerkhove (2010), Hohmann & Holzmann (2013), Xiang et al. (2014), and Ma & Yao (2015) considered the extension of (1.1) when K = 2 and one of the component densities is symmetric but unknown. Mixtures of log-concave densities have been studied by Chang & Walther (2007), Cule et al. (2010) and Balabdaoui & Doss (2014). Chang & Walther (2007) provided an EM-type algorithm and demonstrated sound numerical results in the simulation study. Cule et al. (2010) applied the log-concave mixture model to theWisconsin breast cancer data set. Balabdaoui & Doss (2014) considered a special case when all components have the same symmetric log-concave densities but with different location parameters, and proved the n-consistency of their proposed M-estimators for mixing proportion as well as location parameters. Note that these models are special cases from the family of ℱ. Therefore, their estimators and asymptotic results cannot be applied here. For example, the mixture of normal distributions with different component means and variances belongs to ℱ but does not belong to the model family considered by Balabdaoui & Doss (2014).

To the best of our knowledge, none of the existing works has studied the theoretical properties of the estimator for the log-concave mixture model (1.2) under such general conditions. This paper aims to fill in this gap. We show that theoretically, the LCMLE (in the restricted subset ℱη) exists, and is consistent under fairly general conditions. However, we want to point out that the extension of the properties of the log-concave density to mixtures of log-concave densities is not trivial. The log-density ln = l(·|Qn) = log fn is no longer guaranteed to be a concave function. Consequently, many nice theoretical properties stated in DSS 2011 no longer hold for our mixture model.

The rest of the paper is organized as follows. Section 2 introduces the basic setup, model details, and notations. Section 3 states the theoretical properties. We review the EM-type algorithm for log-concave mixture models in Section 4. Simulation and real data studies are conducted in Section 5 and 6. We end the article with a short conclusion in Section 7. The proofs and lemmas are presented in the appendix.

2. Log-concave maximum likelihood estimator

Let 𝒬 = 𝒬(d) be the family of all distributions Q on ℝd. Our goal is to maximize a log-likelihood-type functional:

L(ϕ,λ,π,Q)=log[j=1Kλjexp{ϕj(x)}]dQ(x)-j=1Kπj(exp{ϕj(x)}dx-1), (2.1)

where πj’s are Lagrange multipliers to incorporate the constraint ∫ exp{ϕj(x)}dx = 1 (j = 1, …, K). We define a profile log-likelihood:

L(Q)=supϕΦη,λΛ,πL(ϕ,λ,π,Q). (2.2)

If, for fixed Q, (ψ, λ*, π*) maximizes L(ϕ, λ, π, Q), it will automatically satisfy that:

πj=E(π(jx))=λjexp{ψj(x)}(h=1Kλhexp{ψh(x)})dQ(x); (2.3)
exp{ψj(x)}dx=1(j=1,2,,K). (2.4)

Note that differing from the non-mixture setting in DSS 2011, πj is not equal to 1.

To verify this, note that ϕ + c ∈ Φ for any fixed vector of functions ϕ ∈ Φ and arbitrary c = (c1, …, cK)T ∈ ℝK, and

L(ψ+c,λ,π,Q)chc=0=(λhexp{ψh(x)}j=1Kλjexp{ψj(x)}dQ(x)-πheψh(x)dx)=0,L(ψ,λ,π,Q)πh=1-exp{ψh(x)}dx=0.

The maximizer (ψ, λ*) forms the log-likelihood maximizer l(x)=logj=1Kλjeψj(x).

3. Theoretical Properties

Before we state the main theories, we first define the convex support of a distribution.

Definition

For any distribution Q, let Q(C) be the probability measure of the set C. The convex support of Q is the set such that:

csupp(Q)={C:Cdclosedandconvex,Q(C)=1}.

The convex support is itself closed and convex with Q(csupp(Q)) = 1.

In the following text. we define:

  • 𝒬1 = {Q ∈ 𝒬 : ∫ ||x||dQ < ∞}, (we define ||x|| as Euclidean norm in our paper).

  • 𝒬0 = {Q ∈ 𝒬 : interior(csupp(Q)) ≠ ∅}.

Theorem 1

For any Q ∈ 𝒬1 ∩ 𝒬0, the value of L(Q) is real and there exists a maximizer:

(ψ,λ,π)=argmaxϕΦη,λΛ,πL(ϕ,λ,π,Q)suchthateψj(x)dx=1forj=1,,K.

Next, we establish the consistency of the estimated mixture density. In the following, we refer to the concept of convergence of distribution as converging with respect to Mallows distance D1: D1(Q, Q′) = inf(X,X′) E||XX′||, where Q and Q′ are two distributions and the infimum is taken over all pairs of (X, X′) such that X ~ Q and X′ ~ Q′. The convergence of Qn to Q0 with respect to Mallows distance, i.e. limn→∞ D1(Qn, Q) = 0, is equivalent to assuming that Qn weakly converges to Q0, denoted by Qnw Q, and ∫ ||x||dQn(x) → ∫ ||x||dQ(x) as n → ∞.

Theorem 2

Let Qn be a sequence such that limn→∞ D1(Qn, Q0) = 0 for some Q0 ∈ 𝒬1 ∩ 𝒬0. Then,

limnL(Qn)=L(Q0).

Let ϕnj’s and λnj’s be the maximizer corresponding to profile log-likelihood L(Qn), i.e, fn(x) = Σλnj exp{ϕnj(x)} = f(·|Qn) ∈ ℱη. For f0(x) = f(·|Q0) ∈ ℱη, we have:

limn,xyfn(x)=f0(y)forally{f00}, (3.1)
limn,xyfn(x)f0(y)forallyd, (3.2)
limnfn(x)-f0(x)dx=0. (3.3)

The above theorem showed the consistency of the estimated mixture density. If we further assume that the true mixture density f0(x) is identifiable, then each estimated component density and mixing proportions are also consistent. We will discuss more about the identifiability issue in Section 7.

4. EM-type algorithm

The EM algorithm for estimating log-concave mixture densities has already been developed by Chang & Walther (2007). Here we briefly summarize it. First we randomly generate initial values for the normal mixture EM-algorithm and run the normal mixture EM-algorithm until convergence, which will provide a good initial value. Then we use the outcome as the starting values for our EM-type algorithm. We assume the observed data X = (x1, …, xn)T ∈ ℝn×d to be incomplete and define the missing value Z = (z1, …, zn)T, where xi = (xi,1, …, xi,d)T and zi is a K-dimension vector with its j-th element given by:

zij={1ifxibelongstojthgroup,0otherwise.

So the complete log-likelihood is:

logf(ϕ,λ;X,Z)=logi=1nj=1K[λjeϕj(xi)]zij=i=1nj=1Kzij[logλj+ϕj(xi)],

In E-step, we replace zij by

zij(t+1)=λj(t)eϕ^j(t)(xi)h=1Kλh(t)eϕ^h(t)(xi).

In M-step, first we update λ by λj(t+1)=1ni=1nzij(t+1), j = 1, …, K. Then we update ϕj by maximizing i=1nzij(t+1)ϕj(xi) with respect to ϕj through the function called mlelcd in the R package LogConcDEAD (Cule et al. (2009)) and get estimator ϕ^j(t+1) for j = 1, …, K. The estimation of ϕ̂j has been studied by Walther (2002) and Rufibach (2007). Given i.i.d. data X1, …, Xn from distribution f, the Log-concave Maximum Likelihood Estimator (LCMLE) n exists uniquely and has support on the convex hull of the data (by Theorem 2 of Cule et al. (2010)). The log-likelihood estimator log n is a piecewise linear function with knots which are a subset of {X1, …, Xn}. Walther (2002) and Rufibach (2007) provided algorithms for computing n(Xi), i = 1, …, n. The entire log-density log n can be computed by linearly interpolating between between log n(X(i)) and log n(X(i+1)). Walther (2002) and Rufibach (2007) also pointed out that it is natural to apply weights for an EM-type algorithm. The z1j(t+1),,znj(t+1) can be viewed as weights for x1, …, xn when estimating the log-concave density ϕj in our algorithm for j = 1, …, K. The algorithm stops once the increasing increment ℓ(t+1) − ℓ(t) is below 10−7, where (t)=i=1nlogj=1kλj(t)exp{ϕj(t)(xi)}.

To avoid the local maximum, we restart the algorithm 20 times and choose the result with the highest log-likelihood. As we discussed in Section 1, the unboundedness issue of the log-likelihood does happen infrequently, mostly due to an inappropriate initial. In our algorithm, we borrow the idea of restarting process in many existing EM-algorithms for parametric mixture models, e.g. Benaglia et al. (2009). If the loglikelihood goes to infinity in any iteration, our EM-type algorithm will be forced to restart from the beginning with a new randomly chosen initial.

5. Simulation Results

5.1. Copula procedure to generate multivariate log-concave mixtures

As we do not have a tuning issue for LCMLE, the most attractive application of LCMLE is the density estimation with dimensionality higher than 1. To generate data from a multivariate log-concave mixture model, we borrow the idea of the copula procedure from Chang & Walther (2007). For a d-dimensional log-concave mixture density, we observe n observations x1, …, xn, where xi = (xi,1, …, xi,d)T ∈ ℝd. To simplify our simulation, we focus on the model whose univariate marginal distributions are log-concave. We model the dependence structure with a normal copula. Suppose (N1, …, Nd)T are multivariate normal with mean 0 and covariance matrix Σ. Let F1, …, Fd be the CDFs of the desired univariate log-concave distributions. Then,

xi=(xi,1,,xi,d)T=(F1-1(Φ(N1)),,Fd-1(Φ(Nd)))T.

5.2. Significant Improvement when densities are log-concave mixtures

We first generate 500 observations from a univariate log-concave mixture model: 0.3Logistic(0, 1) + 0.7Laplace(5, 1) (referred as Model I). This setup is a more general form of Chang & Walther (2007), as Chang & Walther (2007) only considered the case that one component is a location shift of the other. For the multivariate cases, we generate 500 observations based on the copula procedure as we discussed in Section 5.1 for Model II through IV, which are multivariate log-concave mixture models with dimensionality d from 2 to 4. For each model, component 1 (with probability 0.3) is generated as a joint normal distribution N(0, Id); component 2 (with probability 0.7) is generated through a normal copula N(0, 0.5Id +0.51d), where Id is a d×d identity matrix and 1d is a d×d matrix of ones. The marginal distributions of component 2 are summarized in Table 1.

Table 1.

The simulation setups of Model II – IV.

Model d Marginal Distribution of Component 2
II 2 N (0, 1), and Gamma(2, 1) + 2
III 3 N (0, 1), Gamma(2, 1) − 1, and Beta(4, 1)
IV 4 N (0, 1), Gamma(2, 1) + 2, Beta(4, 1), Laplace(0, 1) + 1

We repeat the simulation 100 times for each model. When evaluating the simulation results for mixture models, there is a well-known label switching issue when sorting the labels for mixture models (Stephens (2000); Yao & Lindsay (2012)). In this paper, we adopt the method of Yao (2015) to find labels by minimizing the distance between the estimated classification probabilities and the true labels over different permutations. After sorting the labels, we compute the mean square errors obtained by the log-concave EM algorithm (MSE2) and compare them with the parametric normal EM-algorithm (MSE1) to compare the accuracy of the estimated λ’s. As mixture models also serve as methods of classification, we compute the average misclassification number (denoted as AMN2 for the log-concave EM-algorithm and AMN1 for the normal EM-algorithm) among the 100 replicates. We are also interested in the difference between two classification methods. One of many possible measurements to summarize the similarity between two clusterings is the Adjusted Rand Index (ARI), which ranges from −1 to 1, see Hubert & Arabie (1985) for detailed description of ARI. In this paper, we compute the average Adjusted Rand Index (AvARI) among the 100 replicates.

We report results over the 100 replicates in Table 2. We observe significant smaller MSEs for the estimated λ obtained by log-concave mixture models. Especially for Model I and II, the mean square errors obtained by log-concave mixture model are less than half of those obtained by normal mixture model. In terms of classification, the average misclassification number among the 500 observations are significantly reduced as well. The AvARI indicates that the classification results of the log-concave mixture model and the normal mixture model are quite different, especially for Model I and II.

Table 2.

Simulation results of Model I – IV.

Model d AMN1 AMN2 MSE1 MSE2 AvARI
I 1 17.56 10.86 0.0016 0.0007 0.91
II 2 30.35 13.28 0.0085 0.0013 0.86
III 3 12.43 4.79 0.0010 0.0005 0.93
IV 4 7.97 3.21 0.0006 0.0004 0.95

To compare the classification result for each replicate, we take d = 4 as an example and show the clustering results in Figure 1. In Figure (1a), each point represents a single replicate from Model IV’s setup. The x-axis represents the number of misclassification by the Normal mixture EM-algorithm. The y-axis represents the number of misclassification by our log-concave mixture EM-algorithm. We observe significant improvement in the misclassification rates, as all the points for our 100 replicates are below the identity line.

Figure 1.

Figure 1

Four-dimensional clustering result: normal mixture EM-algorithm vs log-concave mixture EM-algorithm by number of misclassifications. The solid lines represent the identity.

To better illustrate the finite sample performance of the LCMLE, we pick one replicate from Model I. To compare the fitted densities with the true densities, in Figure 2, we plot the true component densities in the solid lines and the fitted densities in the dashed line. Even for a finite sample size of 500, the LCMLE for the log-concave mixture model approximates the true component densities well.

Figure 2.

Figure 2

EM-type algorithm estimation for log-concave mixtures for a single replicate of Model I. Solid line represents the truth and dashed line represents the estimation results. The fitted λ̂ = 0.3076.

5.3. Insignificant penalty when the parametric assumptions are correct

We are also interested in the price that we have to pay for the flexibility while the data actually are from normal mixtures. For Model V – VIII, we generate n = 500 observations from a joint normal mixture distribution, in which the first component (with probability 0.4) is a d-dimensional normal distribution with mean 0d and covariance matrix 0.5Id + 0.51d, and the second component (with probability 0.6) is a d-dimensional normal distribution with mean μd and the same covariance matrix, where μ1 = 5, μ2 = (3, 2)T, μ3 = (3, 2, 2)T, and μ4 = (3, 1, 3, 1)T . We also repeat the simulation 100 times and compare the same criteria.

From Table 3, we observe no significant penalty for applying log-concave mixture models instead of normal mixture models. The MSEs and average misclassification numbers for log-concave mixture models are either almost the same or only a little bit higher than those for the multivariate normal mixture model. The classification results of the log-concave mixture model and the normal mixture model are quite similar to each other, as we observe the AvARI’s are close to 1 in Table 3. This phenomena is further supported in Figure (1b), which shows the classification results for Model VIII (d = 4). We observe no significant difference in terms of misclassifications, as most points in Figure (1b) are around the identity line. Consequently, we conclude that the log-concave mixture model is a more flexible methodology without significant penalties if the data are actually from normal mixtures.

Table 3.

Simulation results of Model V – VIII.

Model d AMN1 AMN2 MSE1 MSE2 AvARI
V 1 3.17 3.51 0.0004 0.0005 0.99
VI 2 33.95 37.12 0.0018 0.0020 0.91
VII 3 21.95 23.57 0.0008 0.0008 0.94
VIII 4 17.71 18.20 0.0018 0.0020 0.96

6. Real Data Application

To further illustrate the performance of log-concave mixture models, we apply the log-concave EM algorithm to the crab data set of Campbell & Mahon (1974), which contains two types of crabs in the data set: 100 blue crabs and 100 orange crabs. We focus on these blue crabs, which include n1 = 50 males and n2 = 50 females referred to as groups G1 and G2, respectively. For each crab, there are five measurements. We are only interested in two of them: RW (rear width) and BD (body depth), both in unit of mm. In Figure 3, we give the scatter plot of RW and BD.

Figure 3.

Figure 3

Scatter plot of RW (rear width) and BD (body depth) of the Blue Crab data set.

Fitting a 2-dimensional two component log-concave mixture model results in 18 observations misclassified. Fifteen observations from G1 are misclassified into G2 and three observations from G2 are misclassified into G1. The normal mixture model results in 20 observations misclassified in total. Two additional observations from G1 are misclassified into G2.

7. Conclusion

The log-concave maximum likelihood estimator (LCMLE) provides more flexibility to estimate mixture densities, when compared to the traditional parametric mixture models. The estimation of LCMLE for log-concave mixtures can be achieved by an EM-type algorithm. The LCMLE is not sensitive to the model mis-specification and consequently, only one implementation of the EM-type algorithm is necessary. Through simulation studies, we observed significant improvements in the sense of classification and no significant penalties when the parametric assumption is indeed correct.

In this paper, we proved the existence of the LCMLE for log-concave mixture models. The consistency is also proved for the estimated mixture density. If the true mixture density is identifiable, then the estimated component densities are also identifiable. However, it is not an easy task to prove the overall identifiability for the most general family of mixtures of log-concave distributions in (1.2) from a nonparametric point of view. Some restrictive conditions, such as symmetry, are needed to ensure identifiability. Hunter et al. (2007) and Bordes et al. (2006a) proved the identifiability of (1.1) if K = 2 and both component densities are symmetric but with different location parameters. Balabdaoui & Doss (2014) has considered a special case of (1.2), when ϕj(x; θj) = ϕ(xθj) and ϕ is a symmetric concave function about 0, and the identifiability of (1.2) follows from Hunter et al. (2007) and Bordes et al. (2006a) when K = 2.

Acknowledgments

We wish to thank the Associate Editor and two reviewers for their helpful comments and suggestions that led to improvements in this paper. Hu’s research is partially supported by National Institutes of Health grant R01-CA149569. Wu’s research is partially supported by National Institutes of Health grant R01-CA149569 and National Science Foundation grant DMS-1055210. Yao’s research is supported by NSF grant DMS-1461677.

Appendix A: Lemmas

Lemma 1 is taken from Cule & Samworth (2010). Lemma 2 to Lemma 5 are taken from DSS 2011. Lemma 6 is the extension of Lemma 2.13 of DSS 2011.

Lemma 1

For any log-concave distribution Q with density f, there exist finite constants B1 = B1(Q) > 0 and B2 = B2(Q) > 0 such that f(x) ≤ B1 exp(−B2||x||) for all x ∈ ℝd.

Lemma 2

The following properties of Q are equivalent:

  1. csupp(Q) has non-empty interior.

  2. Q(H) < 1 for any hyperplane H ⊂ ℝd.

  3. With Leb denoting Lebesgue measure ond,
    limδ0sup{Q(A):Adclosedandconvex,Leb(A)δ}<1.

Lemma 3

Let ϕ be the function such that for any x, yinterior(dom(ϕ)) and t ∈ (0, 1), if tx + (1 − t)yinterior(dom(ϕ)), ϕ(tx + (1 − t)y) ≥ (x) + (1 − t)ϕ(y) and for C ⊆ ℝd,C eϕ(x)dx ≤ 1. We define Dq = {xC : ϕ(x) ≥ q}. For any r < M ≤ maxx∈ℝd ϕ(x),

Leb(Dr)(M-r)de-M/0M-rtde-tdt.

Lemma 4

Let ϕ̄, ϕ1, ϕ2, … be concave functions and ϕnϕ̄. Further we assume the set H = {x : lim infn→∞ ϕn(x) > −∞} is not empty. Then there exist a subsequence (ϕn(k))k of (ϕn)n and a function ϕ such tha Hdom(ϕ)=d{ϕ>-}:

limn,xyϕn(k)(x)=ϕ(y)forallyinterior(dom(ϕ)),limn,xyϕn(k)(x)ϕ(y)forallyd.

Lemma 5

Suppose Qn is a sequence converged to some distribution Q and h be a nonnegative and continuous function, then

liminfnhdQnhdQ.

If the stronger statement lim infn→∞hdQn = ∫ hdQ < ∞ holds, then for any function f such that |f|/(1 + h) is bounded,

limnfdQn=fdQ.

Lemma 6

A point x ∈ ℝd is an interior point of C if and only if h(Q, x) = sup{Q(E): EC, E closed and convex, xinterior(E)}/Q(C) < 1.

Proof

For xinterior(E) and closed and convex E, there exits a unit vector ujRd such that E is contained in the closed set HC which is a subset of C:

CHC(x)={yC:uTyuTx}E.

By the definition of h(Q, x) we conclude h(Q, x) ≤ Q(HC)/Q(C) ≤ 1. There are two cases: EHC and E = HC(x). For the case EHC, by definition h(Q, x) < 1 strictly. For the case E = HC(x), as we have xinterior(E) but xHC(x), we conclude x ∈ ∂HC(x). Now if xinterior(C), by definition, h(Q, x) = 1. On the other hand, if h(Q, x) = 1, then Q(HC(x)) = Q(C), which leads to C = HC(x) = E. Combined with xinterior(HC(x)) we can conclude that xinterior(C). Consequently, xinterior(C) ⇔ h(Q, x) = 1. Thus, xinterior(C) ⇔ h(Q, x) < 1.

Appendix B: Proof of Theorem 1

The first thing is to prove the finiteness of the log-likelihood type function.

L(Q) is the supreme of L(ϕ, λ, π, Q) over all ϕΦ, λΛ, λ ∈ ℝK. If we take a special case that ϕj(x)=-(logλj)-x, L(ϕ*, λ*, π, Q) = log K − ∫ ||x||dQ > −∞. Consequently, L(Q) > −∞.

Now we show L(Q) < ∞. As discussed at the end of Section 2, we do restrict our interest to the ϕ such that ∫eϕj(x)dx = 1 for j = 1, …, K. Consequently, we define the log-density as l(x)=logj=1Kλjeϕj(x) and rewrite the log-likelihood-type function as L(l, Q) = L(ϕ, λ, π, Q). For the convenience of the proof, we define an envelope function ϕ̄(x) = maxj{ϕj(x)}, i.e. ϕ̄(x) ≥ l(x) for every x ∈ ℝd. This function is continuous but not smooth on d − 1 dimensional boundaries. These boundaries divide the csupp(Q) into K sets: C1, …, CK. Each set Cj is defined as Cj = {x ∈ ℝd : ϕ̄(x) = ϕj(x)}. The sets C1, …, CK are disjoint except on the boundaries and Leb(CiCj) = 0 for every ij. For any x, yCj and t ∈ (0, 1), ϕ̄(tx + (1 − t)y) ≥ tϕ̄(x) + (1 − t) ϕ̄(y) and ∫Cj eϕ̄(x)dx ≤ 1. We define Mj(ϕ) and 𝒮(ϕ) as stated in Section 1. As L(l,Q)j=1KQ(Cj)Mj, Mj > −∞, and the restriction |𝒮(ϕ)| ≥ η > 0, we focus our interest on Mj > 0 and the only case which we have to worry about is all Mj’s increasing to infinity. We define Dq = {x ∈ ℝd : ϕ̄(x) ≥ q}. For any c > 0,

L(l,Q)ϕ¯(x)dQ=csupp(Q)\D-cM(1)ϕ¯(x)dQ+D-cM(1)ϕ¯(x)dQ-cM(1)(1-Q(D-cM(1)))+M(K)Q(D-cM(1))(1+cη)(Q(D-cM(1))-cηcη+1)M(K).

We can always find sufficient large c such that the set DcM(1) is a closed and convex subset of ℝd. We define the set Dj,q = {xCj : ϕ̄(x) ≥ q} ⊂ Cj. Obviously Leb(D-cM(1))=j=1KLeb(Dj,-cM(1)). For any c > 0, applying Lemma 3 to set Dj,−cM(1) and letting M = M(1) yield Leb(Dj,-cM(1))(1+c)M(1)de-M(1)/(d!+o(1))0 as M(1) → ∞ for every j = 1, …, K. Consequently, Leb(DcM(1)) → 0 as M(1) → ∞. By our definition, η ∈ (0, 1]. Thus, by Lemma 2, we can find sufficiently large c and small δ such that

sup{Q(D):Dd,Leb(D)δ}<cηcη+1.

Thus, L(l, Q) → −∞ as M(1) → ∞, which indicates that when all modes of log-concave densities increase to infinity, the log-likelihood is poorly characterized. On the other hand, L(l, Q) ≤ M(K). These considerations show that L(Q) is finite and equals the supremum of L(l, Q) for suitable finite Mj’s such that Mj[Mj,Mj](j=1,,K).

Let ϕm,j’s and λm,j’s form a sequence lm(x) = logΣλm,j exp{ϕm,j(x)} such that −∞ < L(lm, Q) ↑ L(Q) as m → ∞. Next, we will prove that for every j ∈ {1, …, K}, there exists a point, say, x0,jinterior(csupp(Q)), such that lim infm→∞ ϕm,j(x0,j) > −∞.

We define ϕ̄m(x) = maxj{ϕm,j(x)}, Cm,j = {x ∈ ℝd : ϕ̄m(x) = ϕm,j(x)}, and Mm,j = maxx∈ℝd ϕm,j(x). For any j* ∈ {1, …, K}, by picking any x0,j*Cm,j* such that ϕm,j(x0,j)[Mm,j,Mm,j), where Mm,j=maxx{Cm,j}ϕm,j(x), there exists a sufficient small ε ≥ 0 such that the set Em,j* = {xCm,j*: ϕm,j*(x) ≥ ϕm,j* (x0,j*) + ε} is a closed and convex subset of Cm,j* and x0,j* is not an interior point of Em,j*. Thus,

L(lm,Q)=lmdQϕ¯m(x)dQ=jjCm,jϕm,j(x)dQ+Cm,jϕm,jdQjjMm,jQ(Cm,j)+ϕm,j(x0,j)Q(Cm,j)+(Mm,j-ϕm,j(x0,j))Q(Em,j)j=1Kmax(Mm,j,0)+ϕm,j(x0,j)Q(Cm,j)(1-hj(Q,x0,j)).

These inequalities hold for the case of ϕm,j* (x0,j*) = Mm,j* as well (ε = 0 accordingly). By Lemma 6, hj* (Q, x0,j*) < 1. Due to the fact that Mm,j* is finite, interior(Cm,j*) is not empty. Consequently, lim infm→∞ Q(Cm,j*) > 0, which yields

ϕm,j(x0,j)-j=1Kmax(Mm,j,0)-L(lm,Q)Q(Cm,j)(1-hj(Q,x0,j))>-j=1Kmax(Mj,0)-L(l1,Q)Q(Cm,j)(1-hj(Q,x0,j))>-.

Hence, the set Hj = {x : lim inf m→∞ ϕm,j(x) > −∞} is not empty for every j ∈ {1, …, K}. From Lemma 1 we conclude that for each ϕj, we can find suitable finite positive constants aj, bj > 0 such that ϕj(x) ≤ ajbj||x|| ≤ ab||x||, where a = maxj aj > 0 and b = minj bj > 0. Then by Lemma 4, there exist a subsequence (ϕ1,m(k1))k1 of (ϕ1,m)m and a concave function ϕ1 such that:

limk1,xyϕ1,m(k1)(x)=ϕ1(y)forallyinterior(dom(ϕ1)),limk1,xyϕ1,m(k1)(x)ϕ1(y)forallyd.

If we define ϕ1 = −∞ on ℝd \ dom(ϕ1), then we can rewrite them as:

limsupk1ϕ1,m(k1)(x)ϕ1(x)forallx{dom(ϕ1)},limk1ϕ1,m(k1)(x)=ϕ1(x)forallxd\{dom(ϕ1)}.

We can find a sub-subsequence in the original subsequence, which has the similar property for ϕ2,m(k2). Keeping doing this sequentially for all ϕm,j’s and λm,j’s will yield the common subsequence lm(k) and a function l*(x) = logΣλj exp{ϕj(x)} such that:

limsupklm(k)(x)l(x)forallxP,limklm(k)(x)=l(x)forallxd\P,

where P=j=1K({dom(ϕj)}) and Leb(℘) = 0. The next step is to prove that l*(x) is the maximizer. Applying Fatou’s lemma to the subsequence function lm(k)(x) ≤ ab||x|| yields

limsupklm(k)dQldQ.

Hence,

L(Q)l(l,Q)limsupkL(lm(k),Q)=L(Q),

from which we conclude L(l*, Q) = L(Q). The first inequality follows by the definition of L(Q). The last equality follows by the definition that lm(k) is a sequence that maximizes L(lm(k), Q) to L(Q) as k → ∞. Thus, it concludes the existence of the maximizer l*, which indicates the existence of λj’s and ϕj’s.

Appendix C: Proof of Theorem 2

We proof the theorem for a subsequence of Qn. Let L(Qn) → Γ. As in the proof of Theorem 1, ln(x) ≤ ab||x|| and inf ϕn,j(x0) > −∞ for some x0interior(csupp(Q)). Therefore, for a subsequence of (Qn)n, there exists a function l* such that ln(y), l*(y) ≤ ab||y||, and

limsupkln(k)(x)l(x)forallxP,limkln(k)(x)=l(x)forallxd\P.

By Skorohod’s theorem, there exists a probability space with random variables Xn ~ Qn, X ~ Q such that XnX almost surely. We define a random variable Hn = ab||Xn|| − ln(Xn) ≥ 0. Applying Fatou’s lemma to Hn yields,

Γ=limnlndQn=limn(a-bx)dQn-E(Hn)=a-bγ-liminfnE(Hn)a-bγ-E(liminfn(Hn))a-bγ-E(a-bX-l(X))=b(xdQ0-γ)+l(X)dQ0=L(l,Q0)L(Q0).

Let l0(x) = logΣλjϕj(x), i.e. λj’s and ϕj’s are the results corresponding with l0. In the following proof we utilize a special approximation scheme. Let l(ε)(x)=logλj(ε)ϕj(ε)(x),λj(ε)=λj and ϕj(ε)=infv,c(vjTx+cj) such that ||vj|| ≤ ε−1 and ϕj(y)vjTy+cj. DSS 2011 shows that the approximation ϕj(ε) is real valued and Lipschitz continuous with constant ε−1. Consequently, l(ε)(x) is also Lipschitz-continuous with constant ε−1. Moreover, ϕj(ε)ϕj and ϕj(ε)ϕj pointwise as ε ↓ 0. Thus, l(ε)l0 pointwise as ε ↓ 0 and l(1)l(ε)l0 for ε ∈ (0, 1). With this approximation, it follows from Lipschitz-continuity, ∫ ||x||dQ0 = γ < ∞, and the stronger version of Lemma 5 that

Γ=limnlndQnlimnL(l(ε),Qn)=limnl(ε)dQn-πjeϕj(ε)(x)dx+1=l(ε)dQ0-πjexp(ϕj(ε)(x))dx+1.

Applying monotone convergence theorem to function l(1)l(ε) and dominated convergence theorem to exp{ϕj(ε)}’s yields, limε↓0+ L(l(ε), Q0) = L(l0, Q0). Hence, Γ ≥ L(Q0). Combining with Γ ≤ L(l*, Q0) ≤ L(Q0) yields Γ = L(Q0) = L(l*, Q0), which indicates that l* equals the maximizer l0 = l(·|Q0) that corresponds to L(Q0).

Applying to density fn = exp{ln} and f0 = exp{l0} yields,

limn,xyfn(x)=f0(y)forallxd\P,limn,xyfn(x)f0(y)forallxP,

where P=j=1K({f0j>0}) and Leb(℘) = 0. Consequently, (fn)nf0 almost everywhere with respect to Lebesgue measure. In addition, |fn(x)| ≤ eab||x||, and ∫ eab||x||dx is finite. Applying Lebesgue’s dominated convergence theorem yields,

limnfn(x)-f0(x)dx=0.

Consequently, we claim Theorem 2 to be true for a subsequence of the original sequence (Qn)n. It remains to show it is true for the entire sequence.

Suppose any assertion about fn is false, then one could replace the initial sequence (Qn)n from the start with a subsequence such that one of the following three conditions is satisfied:

  1. limn→∞ fn(xn) > f0(y) for some sequence (xn)n converge to point y;

  2. limn→∞ fn(xn) < f0(y) for some sequence (xn)n converge to point y;

  3. limn→∞ ∫ |fn(x) − f0(x)|dx > 0.

Any of these three properties are transmitted to subsequence of (Qn)n, which would lead to a contradiction.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Contributor Information

Hao Hu, Email: hhu5@ncsu.edu.

Yichao Wu, Email: wu@stat.ncsu.edu.

Weixin Yao, Email: weixin.yao@ucr.edu.

References

  1. Balabdaoui Fadoua, Doss Charles R. Inference for a mixture of symmetric distributions under log-concavity. 2014 arXiv preprint arXiv:1411.4708. [Google Scholar]
  2. Balabdaoui Fadoua, Rufibach Kaspar, Wellner Jon A. Limit distribution theory for maximum likelihood estimation of a log-concave density. Annals of statistics. 2009;37(3):1299. doi: 10.1214/08-AOS609. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Benaglia Tatiana, Chauveau Didier, Hunter David, Young Derek. mixtools: An R package for analyzing finite mixture models. Journal of Statistical Software. 2009;32(6):1–29. [Google Scholar]
  4. Bordes Laurent, Vandekerkhove Pierre. Semiparametric two-component mixture model with a known component: an asymptotically normal estimator. Mathematical Methods of Statistics. 2010;19(1):22–41. [Google Scholar]
  5. Bordes Laurent, Mottelet Stéphane, Vandekerkhove Pierre, et al. Semiparametric estimation of a two-component mixture model. The Annals of Statistics. 2006a;34(3):1204–1232. [Google Scholar]
  6. Bordes Laurent, Delmas Céline, Vandekerkhove Pierre. Semiparametric Estimation of a Two-component Mixture Model where One Component is known. Scandinavian journal of statistics. 2006b;33(4):733–752. [Google Scholar]
  7. Butucea Cristina, Vandekerkhove Pierre. Semiparametric mixtures of symmetric distributions. Scandinavian Journal of Statistics. 2014;41(1):227–239. [Google Scholar]
  8. Campbell NA, Mahon RJ. A multivariate study of variation in two species of rock crab of the genus Leptograpsus. Australian Journal of Zoology. 1974;22(3):417–425. [Google Scholar]
  9. Chang George T, Walther Guenther. Clustering with mixtures of log-concave distributions. Computational Statistics & Data Analysis. 2007;51(12):6242–6251. [Google Scholar]
  10. Chee Chew-Seng, Wang Yong. Estimation of finite mixtures with symmetric components. Statistics and Computing. 2013;23(2):233–249. [Google Scholar]
  11. Chen Jiahua, Tan Xianming, Zhang Runchu. Inference for normal mixtures in mean and variance. Statistica Sinica. 2008;18(2):443. [Google Scholar]
  12. Chen Yining, Samworth Richard J. Smoothed log-concave maximum likelihood estimation with applications. Statist Sinica. 2013;23:1373–1398. [Google Scholar]
  13. Cule Madeleine, Samworth Richard. Theoretical properties of the log-concave maximum likelihood estimator of a multidimensional density. Electronic Journal of Statistics. 2010;4:254–270. [Google Scholar]
  14. Cule Madeleine, Gramacy Robert, Samworth Richard. LogConcDEAD: An R package for maximum likelihood estimation of a multivariate log-concave density. Journal of Statistical Software. 2009;29(2):1–20. [Google Scholar]
  15. Cule Madeleine, Samworth Richard, Stewart Michael. Maximum likelihood estimation of a multi-dimensional log-concave density. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2010;72(5):545–607. [Google Scholar]
  16. Dempster Arthur P, Laird Nan M, Rubin Donald B. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 1977:1–38. [Google Scholar]
  17. Doss Charles, Wellner Jon A. Global rates of convergence of the MLEs of log-concave and s-concave densities. 2013 doi: 10.1214/15-AOS1394. arXiv preprint arXiv:1306.1438. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Dümbgen Lutz, Rufibach Kaspar. Maximum likelihood estimation of a log-concave density and its distribution function: Basic properties and uniform consistency. Bernoulli. 2009;15(1):40–68. [Google Scholar]
  19. Dümbgen Lutz, Samworth Richard, Schuhmacher Dominic. Approximation by log-concave distributions, with applications to regression. The Annals of Statistics. 2011;39(2):702–730. [Google Scholar]
  20. Hathaway Richard J. A constrained formulation of maximum-likelihood estimation for normal mixture distributions. The Annals of Statistics. 1985:795–800. [Google Scholar]
  21. Hohmann Daniel, Holzmann Hajo. Semiparametric location mixtures with distinct components. Statistics. 2013;47(2):348–362. [Google Scholar]
  22. Hubert Lawrence, Arabie Phipps. Comparing partitions. Journal of classification. 1985;2(1):193–218. [Google Scholar]
  23. Hunter David R, Wang Shaoli, Hettmansperger Thomas P. Inference for mixtures of symmetric distributions. The Annals of Statistics. 2007:224–251. [Google Scholar]
  24. Kim Arlene KH, Samworth Richard J. Global rates of convergence in log-concave density estimation. 2014 arXiv preprint arXiv:1404.2298. [Google Scholar]
  25. Ma Yanyuan, Yao Weixin. Flexible estimation of a semiparametric two-component mixture model with one parametric component. Electronic Journal of Statistics. 2015;9:444–474. [Google Scholar]
  26. McLachlan Geoffrey, Krishnan Thriyambakam. The EM algorithm and extensions. Vol. 382. John Wiley & Sons; 2007. [Google Scholar]
  27. McLachlan Geoffrey, Peel David. Finite mixture models. John Wiley & Sons; 2000. [Google Scholar]
  28. Mcnicholas Paul David, Murphy Thomas Brendan. Parsimonious Gaussian mixture models. Statistics and Computing. 2008;18(3):285–296. [Google Scholar]
  29. Pal Jayanta Kumar, Woodroofe Michael, Meyer Mary. Estimating a Polya Frequency Function. Lecture Notes-Monograph Series. 2007:239–249. [Google Scholar]
  30. Rufibach Kaspar. Computing maximum likelihood estimators of a log-concave density function. Journal of Statistical Computation and Simulation. 2007;77(7):561–574. [Google Scholar]
  31. Stephens Matthew. Dealing with label switching in mixture models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2000;62(4):795–809. [Google Scholar]
  32. Walther Guenther. Detecting the presence of mixing with multiscale maximum likelihood. Journal of the American Statistical Association. 2002;97(458):508–513. [Google Scholar]
  33. Xiang Sijia, Yao Weixin, Wu Jingjing. Minimum profile Hellinger distance estimation for a semiparametric mixture model. Canadian Journal of Statistics. 2014;42(2):246–267. [Google Scholar]
  34. Yao Weixin. A profile likelihood method for normal mixture with unequal variance. Journal of Statistical Planning and Inference. 2010;140(7):2089–2098. [Google Scholar]
  35. Yao Weixin. Label switching and its solutions for frequentist mixture models. Journal of Statistical Computation and Simulation. 2015;85(5):1000–1012. [Google Scholar]
  36. Yao Weixin, Lindsay Bruce G. Bayesian mixture labeling by highest posterior density. Journal of the American Statistical Association 2012 [Google Scholar]

RESOURCES