Skip to main content
PLOS One logoLink to PLOS One
. 2019 Apr 25;14(4):e0215529. doi: 10.1371/journal.pone.0215529

Using the sample maximum to estimate the parameters of the underlying distribution

Alex Capaldi 1, Tiffany N Kolba 1,*
Editor: Eugene Demidenko2
PMCID: PMC6483189  PMID: 31022209

Abstract

We propose novel estimators for the parameters of an exponential distribution and a normal distribution when the only known information is a sample of sample maxima; i.e., the known information consists of a sample of m values, each of which is the maximum of a sample of n independent random variables drawn from the underlying exponential or normal distribution. We analyze the accuracy and precision of the estimators using extreme value theory, as well as through simulations of the sampling distributions. For the exponential distribution, the estimator of the mean is unbiased and its variance decreases as either m or n increases. Likewise, for the normal distribution, we show that the estimator of the mean has negligible bias and the estimator of the variance is unbiased. While the variance of the estimators for the normal distribution decreases as m, the number of sample maxima, increases, the variance increases as n, the sample size over which the maximum is computed, increases. We apply our method to estimate the mean length of pollen tubes in the flowering plant Arabidopsis thaliana, where the known biological information fits our context of a sample of sample maxima.

1 Introduction

Consider the scenario where one has obtained data where each observation is the maximum value of n independent, identically distributed random variables drawn from either an exponential distribution or a normal distribution with unknown parameters. That is, XijiidExp(β) or XijiidN(μ,σ2) for i = 1, …, n and known data is drawn from Yj=max{Xij}i=1n for j = 1, …, m. Here we present a process to estimate the mean β of the underlying exponential distribution or the mean μ and variance σ2 of the underlying normal distribution from only the set of Yj’s.

Much previous research has been conducted in the field of extreme value theory on the distribution of the maximum from a sample of n independent, identically distributed random variables. In particular, the Fisher-Tippett-Gnedenko theorem states that the distribution of the sample maximum, after proper rescaling, can only converge to one of three types of distributions: the Gumbel distribution, the Fréchet distribution, or the Weibull distribution [1]. When the original underlying distribution is exponential or normal, the limiting distribution of the rescaled sample maximum is the Gumbel distribution. Extreme value theory has been applied in many applications, such as estimating the probability of an extreme flood, severe adverse side effect of a drug, maximum environmental load on a structure, or large insurance loss [2]. In these applications, the underlying distribution and its parameters are typically known and the focus is on estimating the probability distribution of the sample maximum. The focus of our scenario is novel in that we are using a known sample of sample maxima to estimate unknown parameters of the underlying distribution.

Standard techniques for estimating unknown parameters, such as method of moments or maximum likelihood estimation, typically assume that the known information consists of a sample of observations drawn directly from the underlying population distribution. However, in our scenario under consideration, the direct observations are unknown. Rather, we only know the maximum value of each sample of direct observations. The estimators we propose in the following sections are the first, to our knowledge, for estimating unknown population parameters when the only known information is a sample of sample maxima.

2 Estimator for exponential distribution

We begin by considering the case where the underlying distribution is exponential with unknown mean β. In Theorem 1 below, we propose an estimator for β and compute its expected value and variance.

Theorem 1. Let XijiidExp(β) for i = 1, …, n and Yj=max{Xij}i=1n for j = 1, …, m. Set

β^=Y¯Hn=1mHnj=1mYj. (1)

Then

E(β^)=β,var(β^)=β2GnmHn2 (2)

where

Hn=i=1n1i,Gn=i=1n1i2.

Proof. From the formula for β^ given in Eq (1), it directly follows that

E(β^)=E(Yj)Hn,var(β^)=var(Yj)mHn2.

Hence, it only remains to compute the expected value and variance of the maximum of a single sample of n independent Exp(β) random variables. Let X(i) denote the ith smallest observation from such a sample. Then Yj = X(n) can be decomposed as the following telescoping sum:

Yj=X(n)=X(1)+(X(2)-X(1))++(X(n)-X(n-1)).

Due to the memoryless property of the exponential distribution, X(2)X(1) is independent of X(1). Moreover, while X(1) is the minimum of n independent Exp(β) random variables, X(2)X(1) can be viewed as the minimum of a sample of n − 1 independent Exp(β) random variables. Likewise, all of the terms in the telescoping sum for Yj = X(n) are independent with X(i+1)X(i) equal in distribution to the minimum of a sample of ni independent Exp(β) random variables, which in turn is equal in distribution to Exp(β/(ni)). Thus,

E(Yj)=E(X(n))=βn+βn-1++β2+β=βi=1n1i=βHn

and

var(Yj)=var(X(n))=β2n2+β2(n-1)2++β222+β2=β2i=1n1i2=β2Gn.

Substituting these expressions for E(Yj) and var(Yj) into the equations for the expected value and variance of β^ produces the formulas given in Eq (2).

As a consequence of Theorem 1, we have shown that β^ is an unbiased estimator for β and that its variance decreases at a rate proportional to 1m. Since Gnπ26 and Hn → ∞ at rate log n as n → ∞, the variance of β^ decreases at a rate proportional to 1(logn)2. Thus, the precision of the estimator β^ can be improved more rapidly by increasing m, the number of sample maxima, compared to increasing n, the sample size over which the maximum is computed.

3 Estimators for normal distribution

We now consider the case where the underlying distribution is normal with unknown mean μ and unknown variance σ2. In Theorem 2 below, we propose estimators for μ and σ2 and analyze their expected value, while in Theorem 3 we analyze the variance of the estimators.

Theorem 2. Let XijiidN(μ,σ2) for i = 1, …, n and Yj=max{Xij}i=1n for j = 1, …, m. Let Y¯ and SY2 denote the sample mean and sample variance, respectively, of the Yjs. Set

μ^=Y¯-kncnSY,σ^2=SY2cn (3)

where kn denotes the mean and cn denotes the variance of the maximum of n independent, identically distributed N(0, 1) random variables. Then E(σ^2)=σ2, while E(μ^)>μ with E(μ^)μ as m → ∞.

Proof. The cumulative distribution function of Yj is given by

FYj(y)=P(Yjy)=[P(Xijy)]n=[P(Xij-μσy-μσ)]n=[Φ(y-μσ)]n.

Differentiating, the probability density function of Yj is

fYj(y)=n[Φ(y-μσ)]n-1ϕ(y-μσ)1σ,

where Φ(z) denotes the cumulative distribution function and ϕ(z) denotes the probability density function of a N(0, 1) random variable. The expected value of Yj can then be calculated as

E(Yj)=-yfYj(y)dy=-yn[Φ(y-μσ)]n-1ϕ(y-μσ)1σdy=μ+σ-zn[Φ(z)]n-1ϕ(z)dz=μ+σkn.

We also obtain that the variance of Yj is

var(Yj)=-(y-E(Yj))2fYj(y)dy=-(y-μ-σkn)2n[Φ(y-μσ)]n-1ϕ(y-μσ)1σdy=σ2-(z-kn)2n[Φ(z)]n-1ϕ(z)dz=σ2cn.

We can now use the equations for E(Yj) and var(Yj) to compute the expected value of the estimators. In particular, the expected value of the estimator of the variance is

E(σ^2)=E(SY2)cn=var(Yj)cn=σ2cncn=σ2,

while the estimator of the mean is

E(μ^)=E(Yj)-kncnE(SY)>(μ+σkn)-kncn(σcn)=μ.

Although Jensen’s inequality implies that μ^ has positive bias since E(SY)<E(SY2)=var(Yj), the bias of the sample standard deviation goes to zero as the sample size increases. Hence, E(μ^)μ as m → ∞.

Note that the constants kn and cn that appear in the formulas for the estimators depend upon only the sample size n. Exact integral expressions exist for kn and cn and are given in the proof of Theorem 2, but the integrals cannot be evaluated in closed form. However, the constants can be approximated either analytically or numerically.

In [3], Cramér showed that bn(Z(n)bn) converges in distribution to the standard Gumbel distribution, where Z(n) is the maximum of a sample of n independent, identically distributed N(0, 1) random variables and

bn=(2logn)12-12log(4πlogn)(2logn)12.

Since the standard Gumbel distribution has mean equal to γ ≈ 0.5772, the Euler-Mascheroni constant, and variance equal to π26, we can use these values to approximate

knγbn+bn,cnπ26bn2. (4)

Fig 1 displays the values of the analytic approximations for kn and cn given in Eq (4) for n ranging from 10 to 100,000, along with corresponding numerical approximations, plotted on a semi-log scale. The numerical approximations for kn and cn were computed from 10,000 realizations of a simulation of the maximum.

Fig 1. From the top curve to the bottom, the plot displays the values of the analytic approximation for kn (solid) and the numerical approximation for kn (dashed), along with the analytic approximation for cn (dotted) and the numerical approximation for cn (dot-dashed).

Fig 1

While Theorem 2 showed that μ^ is positively biased with the bias approaching zero as m → ∞, the estimation bias is fairly minimal even for relatively small values of m. Fig 2 displays the sampling distributions of the estimators for μ and σ2 for varying values of m and n. In all the simulations, the sampling distribution of μ^ is fairly centered around the true value of μ = 0. Setting the true value of μ to a nonzero value simply shifts the sampling distribution of μ^ and has no effect on σ^2. We also observe from Fig 2 that the variability of the sampling distributions of the estimators decreases as m, the number of sample maxima, increases, but increases as n, the sample size over which the maximum is computed, increases. We derive an analytical justification for this behavior in Theorem 3 below.

Fig 2. Estimates of μ^ (top) and σ^2 (bottom) from 100 realizations with n, m = 10, 100, and 1000.

Fig 2

The horizontal lines indicate the true values of μ and σ2.

Theorem 3. Let XijiidN(μ,σ2) for i = 1, …, n and Yj=max{Xij}i=1n for j = 1, …, m. Let Y¯ and SY2 denote the sample mean and sample variance, respectively, of the Yjs. Set

μ^=Y¯-kncnSY,σ^2=SY2cn

where kn denotes the mean and cn denotes the variance of the maximum of n independent, identically distributed N(0, 1) random variables. Then

var(σ^2)=σ4(2m-1+κm)

and

var(μ^)σ2cnm+σ2kn24(2m-1+κm)-σ2kncnγ1m

where γ1 is the skewness and κ is the excess kurtosis of the distribution of Yj.

Proof. The variance of SY2 can be computed as

var(SY2)=(var(Yj))2(2m-1+κm),

where κ is the excess kurtosis of the distribution of Yj [4]. Since we computed that var(Yj) = σ2cn in Theorem 2, we obtain the desired result

var(σ^2)=var(SY2cn)=σ4(2m-1+κm)

for the variance of the variance estimator. Now for the variance of the mean estimator, we compute that

var(μ^)=var(Y¯-kncnSY)=var(Y¯)+kn2cnvar(SY)-2kncncov(Y¯,SY).

To simplify the above expression, we use the fact that var(Y¯)=var(Yj)m, along with the approximations

var(SY)var(SY2)4var(Yj),cov(Y¯,SY)cov(Y¯,SY2)2var(Yj)=var(Yj)γ12m,

where γ1 is the skewness of the distribution of Yj [5]. Using these approximations, we obtain the desired result

var(μ^)σ2cnm+σ2kn24(2m-1+κm)-σ2kncnγ1m

for the variance of the mean estimator.

As n increases, the value of κ increases monotonically from 0, the excess kurtosis of the normal distribution, to 12/5, the excess kurtosis of the Gumbel distribution [1]. Thus, from Theorem 3, we observe that the variance of the sampling distribution of σ^2 increases proportionally to σ4 as σ increases and decreases proportionally to 1/m as m increases, but only slightly increases as n increases.

As with the excess kurtosis, the skewness of the distribution of Yj increases monotonically from the value for the normal distribution, i.e., γ1 = 0, to the value for the standard Gumbel distribution, i.e., γ1 ≈ 1.13955, as n increases [6]. Since the constant cn decreases towards 0 while the constant kn increases towards infinity, the dominant term in the variance of μ^ increases proportionally to kn2 as n increases. We also observe from Theorem 3 that the variance of the sampling distribution of μ^ increases proportionally to σ2 as σ increases and decreases proportionally to 1/m as m increases. These relationships explain the behavior of the sampling distributions displayed in Fig 2.

4 Biological application

During fertilization in flowering plants, once pollen land on the stigma, the pollen will grow tubes that travel down through a transmitting tract from the stigma toward an ovule. Pollen compete against each other in a race towards the limited number of ovules to determine which pollen will father the seeds. The mean length of the population of pollen tubes at various time points is of interest to plant biologists, yet, to date, there are only measures of the lengths of the longest pollen tubes in such competitions [7]. Since the pollen tube lengths must have a positive value, it is reasonable to assume that the lengths follow an exponential distribution. Hence, our method described in Section 2 will allow the mean pollen tube length to be estimated given the structure of the experimental data.

In [7], Swanson et al. measured the longest pollen tube lengths at four time points for two accessions (i.e., specific geographical populations) of Arabidopsis thaliana in a laboratory setting. For both the Columbia and Landsberg accessions, either m = 8 or m = 9 individual plants were used for each time point. The average number of pollen tubes within each plant was n = 933 for the Columbia accession and n = 727 for the Landsberg accession. Table 1 reports the sample mean of the longest pollen tube from the m plants for each accession after 3, 6, 9, and 24 hours. Using these sample means of the longest lengths and Eq (2) from Theorem 1, we then estimated the overall mean length for each accession at each time point. The resulting estimates and their standard errors are listed in Table 1.

Table 1. The sample mean, y¯, of the longest pollen tube from m plants, where each plant contained an average of n pollen tubes, along with the corresponding estimated mean pollen length, β^, and its standard error, for two accessions of Arabidopsis thaliana at varying time points.

Accession time (hrs) m y¯ (mm) β^ (mm) SEβ^ (mm)
Columbia
(n = 933)
3 9 0.690 0.093 0.005
6 8 1.069 0.144 0.009
9 9 2.538 0.342 0.020
24 9 2.778 0.375 0.022
Landsberg
(n = 727)
3 9 0.474 0.066 0.004
6 8 0.676 0.094 0.006
9 8 1.795 0.251 0.016
24 9 2.325 0.324 0.019

To further evaluate the validity of the assumption that the population of pollen tube lengths is exponentially distributed, we produced Q-Q plots of the distribution of the maximum pollen tube length from the laboratory experiments performed by Swanson et al. versus the distribution of the maximum from an exponential distribution. The theoretical distribution of the maximum from an exponential distribution was simulated using 10,000 realizations of Yj=max{Xij}i=1n where XijiidExp(β^) for i = 1, …, n, using the values of n and β^ that are listed in Table 1. The Q-Q plots, displayed in Fig 3, show a roughly linear relationship, supporting the assumption that the underlying distribution of pollen tube lengths is exponential. Moreover, we performed a Kolmogorov-Smirnov test of the equality of the empirical and theoretic distributions for each accession of Arabidopsis thaliana at each time point. The smallest resulting p-value was 0.52 (corresponding to the Landsberg accession at 9 hours), further indicating that there is no evidence that the distribution of pollen tube lengths differs significantly from an exponential distribution.

Fig 3. Q-Q plots of the distribution of the maximum pollen tube length from laboratory experiments versus the distribution of the maximum from an exponential distribution.

Fig 3

Acknowledgments

This work was supported by the National Science Foundation under grant IOS-1645508. We would also like to thank Swanson et al. for providing the data for the biological example.

Data Availability

All relevant data are within the manuscript, Supporting Information files, and on the OSF data repository: https://osf.io/zg4ka/.

Funding Statement

A.C. was supported by the National Science Foundation under grant IOS-1645508 (https://www.nsf.gov/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. David HA, Nagaraja HN. Order statistics. Wiley Online Library; 1970. [Google Scholar]
  • 2. Castillo E, Hadi AS, Balakrishnan N, Sarabia JM. Extreme value and related models with applications in engineering and science. Wiley Hoboken, NJ; 2005. [Google Scholar]
  • 3. Cramér H. Mathematical methods of statistics. Princeton University Press; 1946. [Google Scholar]
  • 4. Mood AM, Graybill FA, Boes DC. Introduction to the theory of statistics. McGraw-Hill; 1974. [Google Scholar]
  • 5. Rao CR. Linear statistical inference and its applications, vol 2 Wiley; New York; 1973. [Google Scholar]
  • 6. Gumbel EJ. Statistics of extremes. Courier Corporation; 2012. [Google Scholar]
  • 7. Swanson RJ, Hammond AT, Carlson AL, Gong H, Donovan TK. Pollen performance traits reveal prezygotic nonrandom mating and interference competition in Arabidopsis thaliana. American Journal of Botany. 2016. February; 103(3):498–513. 10.3732/ajb.1500172 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All relevant data are within the manuscript, Supporting Information files, and on the OSF data repository: https://osf.io/zg4ka/.


Articles from PLoS ONE are provided here courtesy of PLOS

RESOURCES