Abstract
We propose novel estimators for the parameters of an exponential distribution and a normal distribution when the only known information is a sample of sample maxima; i.e., the known information consists of a sample of m values, each of which is the maximum of a sample of n independent random variables drawn from the underlying exponential or normal distribution. We analyze the accuracy and precision of the estimators using extreme value theory, as well as through simulations of the sampling distributions. For the exponential distribution, the estimator of the mean is unbiased and its variance decreases as either m or n increases. Likewise, for the normal distribution, we show that the estimator of the mean has negligible bias and the estimator of the variance is unbiased. While the variance of the estimators for the normal distribution decreases as m, the number of sample maxima, increases, the variance increases as n, the sample size over which the maximum is computed, increases. We apply our method to estimate the mean length of pollen tubes in the flowering plant Arabidopsis thaliana, where the known biological information fits our context of a sample of sample maxima.
1 Introduction
Consider the scenario where one has obtained data where each observation is the maximum value of n independent, identically distributed random variables drawn from either an exponential distribution or a normal distribution with unknown parameters. That is, or for i = 1, …, n and known data is drawn from for j = 1, …, m. Here we present a process to estimate the mean β of the underlying exponential distribution or the mean μ and variance σ2 of the underlying normal distribution from only the set of Yj’s.
Much previous research has been conducted in the field of extreme value theory on the distribution of the maximum from a sample of n independent, identically distributed random variables. In particular, the Fisher-Tippett-Gnedenko theorem states that the distribution of the sample maximum, after proper rescaling, can only converge to one of three types of distributions: the Gumbel distribution, the Fréchet distribution, or the Weibull distribution [1]. When the original underlying distribution is exponential or normal, the limiting distribution of the rescaled sample maximum is the Gumbel distribution. Extreme value theory has been applied in many applications, such as estimating the probability of an extreme flood, severe adverse side effect of a drug, maximum environmental load on a structure, or large insurance loss [2]. In these applications, the underlying distribution and its parameters are typically known and the focus is on estimating the probability distribution of the sample maximum. The focus of our scenario is novel in that we are using a known sample of sample maxima to estimate unknown parameters of the underlying distribution.
Standard techniques for estimating unknown parameters, such as method of moments or maximum likelihood estimation, typically assume that the known information consists of a sample of observations drawn directly from the underlying population distribution. However, in our scenario under consideration, the direct observations are unknown. Rather, we only know the maximum value of each sample of direct observations. The estimators we propose in the following sections are the first, to our knowledge, for estimating unknown population parameters when the only known information is a sample of sample maxima.
2 Estimator for exponential distribution
We begin by considering the case where the underlying distribution is exponential with unknown mean β. In Theorem 1 below, we propose an estimator for β and compute its expected value and variance.
Theorem 1. Let for i = 1, …, n and for j = 1, …, m. Set
| (1) |
Then
| (2) |
where
Proof. From the formula for given in Eq (1), it directly follows that
Hence, it only remains to compute the expected value and variance of the maximum of a single sample of n independent Exp(β) random variables. Let X(i) denote the ith smallest observation from such a sample. Then Yj = X(n) can be decomposed as the following telescoping sum:
Due to the memoryless property of the exponential distribution, X(2) − X(1) is independent of X(1). Moreover, while X(1) is the minimum of n independent Exp(β) random variables, X(2) − X(1) can be viewed as the minimum of a sample of n − 1 independent Exp(β) random variables. Likewise, all of the terms in the telescoping sum for Yj = X(n) are independent with X(i+1) − X(i) equal in distribution to the minimum of a sample of n − i independent Exp(β) random variables, which in turn is equal in distribution to Exp(β/(n − i)). Thus,
and
Substituting these expressions for E(Yj) and var(Yj) into the equations for the expected value and variance of produces the formulas given in Eq (2).
As a consequence of Theorem 1, we have shown that is an unbiased estimator for β and that its variance decreases at a rate proportional to . Since and Hn → ∞ at rate log n as n → ∞, the variance of decreases at a rate proportional to . Thus, the precision of the estimator can be improved more rapidly by increasing m, the number of sample maxima, compared to increasing n, the sample size over which the maximum is computed.
3 Estimators for normal distribution
We now consider the case where the underlying distribution is normal with unknown mean μ and unknown variance σ2. In Theorem 2 below, we propose estimators for μ and σ2 and analyze their expected value, while in Theorem 3 we analyze the variance of the estimators.
Theorem 2. Let for i = 1, …, n and for j = 1, …, m. Let and denote the sample mean and sample variance, respectively, of the Yj’s. Set
| (3) |
where kn denotes the mean and cn denotes the variance of the maximum of n independent, identically distributed N(0, 1) random variables. Then , while with as m → ∞.
Proof. The cumulative distribution function of Yj is given by
Differentiating, the probability density function of Yj is
where Φ(z) denotes the cumulative distribution function and ϕ(z) denotes the probability density function of a N(0, 1) random variable. The expected value of Yj can then be calculated as
We also obtain that the variance of Yj is
We can now use the equations for E(Yj) and var(Yj) to compute the expected value of the estimators. In particular, the expected value of the estimator of the variance is
while the estimator of the mean is
Although Jensen’s inequality implies that has positive bias since , the bias of the sample standard deviation goes to zero as the sample size increases. Hence, as m → ∞.
Note that the constants kn and cn that appear in the formulas for the estimators depend upon only the sample size n. Exact integral expressions exist for kn and cn and are given in the proof of Theorem 2, but the integrals cannot be evaluated in closed form. However, the constants can be approximated either analytically or numerically.
In [3], Cramér showed that bn(Z(n) − bn) converges in distribution to the standard Gumbel distribution, where Z(n) is the maximum of a sample of n independent, identically distributed N(0, 1) random variables and
Since the standard Gumbel distribution has mean equal to γ ≈ 0.5772, the Euler-Mascheroni constant, and variance equal to , we can use these values to approximate
| (4) |
Fig 1 displays the values of the analytic approximations for kn and cn given in Eq (4) for n ranging from 10 to 100,000, along with corresponding numerical approximations, plotted on a semi-log scale. The numerical approximations for kn and cn were computed from 10,000 realizations of a simulation of the maximum.
Fig 1. From the top curve to the bottom, the plot displays the values of the analytic approximation for kn (solid) and the numerical approximation for kn (dashed), along with the analytic approximation for cn (dotted) and the numerical approximation for cn (dot-dashed).
While Theorem 2 showed that is positively biased with the bias approaching zero as m → ∞, the estimation bias is fairly minimal even for relatively small values of m. Fig 2 displays the sampling distributions of the estimators for μ and σ2 for varying values of m and n. In all the simulations, the sampling distribution of is fairly centered around the true value of μ = 0. Setting the true value of μ to a nonzero value simply shifts the sampling distribution of and has no effect on . We also observe from Fig 2 that the variability of the sampling distributions of the estimators decreases as m, the number of sample maxima, increases, but increases as n, the sample size over which the maximum is computed, increases. We derive an analytical justification for this behavior in Theorem 3 below.
Fig 2. Estimates of (top) and (bottom) from 100 realizations with n, m = 10, 100, and 1000.
The horizontal lines indicate the true values of μ and σ2.
Theorem 3. Let for i = 1, …, n and for j = 1, …, m. Let and denote the sample mean and sample variance, respectively, of the Yj’s. Set
where kn denotes the mean and cn denotes the variance of the maximum of n independent, identically distributed N(0, 1) random variables. Then
and
where γ1 is the skewness and κ is the excess kurtosis of the distribution of Yj.
Proof. The variance of can be computed as
where κ is the excess kurtosis of the distribution of Yj [4]. Since we computed that var(Yj) = σ2cn in Theorem 2, we obtain the desired result
for the variance of the variance estimator. Now for the variance of the mean estimator, we compute that
To simplify the above expression, we use the fact that , along with the approximations
where γ1 is the skewness of the distribution of Yj [5]. Using these approximations, we obtain the desired result
for the variance of the mean estimator.
As n increases, the value of κ increases monotonically from 0, the excess kurtosis of the normal distribution, to 12/5, the excess kurtosis of the Gumbel distribution [1]. Thus, from Theorem 3, we observe that the variance of the sampling distribution of increases proportionally to σ4 as σ increases and decreases proportionally to 1/m as m increases, but only slightly increases as n increases.
As with the excess kurtosis, the skewness of the distribution of Yj increases monotonically from the value for the normal distribution, i.e., γ1 = 0, to the value for the standard Gumbel distribution, i.e., γ1 ≈ 1.13955, as n increases [6]. Since the constant cn decreases towards 0 while the constant kn increases towards infinity, the dominant term in the variance of increases proportionally to as n increases. We also observe from Theorem 3 that the variance of the sampling distribution of increases proportionally to σ2 as σ increases and decreases proportionally to 1/m as m increases. These relationships explain the behavior of the sampling distributions displayed in Fig 2.
4 Biological application
During fertilization in flowering plants, once pollen land on the stigma, the pollen will grow tubes that travel down through a transmitting tract from the stigma toward an ovule. Pollen compete against each other in a race towards the limited number of ovules to determine which pollen will father the seeds. The mean length of the population of pollen tubes at various time points is of interest to plant biologists, yet, to date, there are only measures of the lengths of the longest pollen tubes in such competitions [7]. Since the pollen tube lengths must have a positive value, it is reasonable to assume that the lengths follow an exponential distribution. Hence, our method described in Section 2 will allow the mean pollen tube length to be estimated given the structure of the experimental data.
In [7], Swanson et al. measured the longest pollen tube lengths at four time points for two accessions (i.e., specific geographical populations) of Arabidopsis thaliana in a laboratory setting. For both the Columbia and Landsberg accessions, either m = 8 or m = 9 individual plants were used for each time point. The average number of pollen tubes within each plant was n = 933 for the Columbia accession and n = 727 for the Landsberg accession. Table 1 reports the sample mean of the longest pollen tube from the m plants for each accession after 3, 6, 9, and 24 hours. Using these sample means of the longest lengths and Eq (2) from Theorem 1, we then estimated the overall mean length for each accession at each time point. The resulting estimates and their standard errors are listed in Table 1.
Table 1. The sample mean, , of the longest pollen tube from m plants, where each plant contained an average of n pollen tubes, along with the corresponding estimated mean pollen length, , and its standard error, for two accessions of Arabidopsis thaliana at varying time points.
| Accession | time (hrs) | m | (mm) | (mm) | (mm) |
|---|---|---|---|---|---|
| Columbia (n = 933) |
3 | 9 | 0.690 | 0.093 | 0.005 |
| 6 | 8 | 1.069 | 0.144 | 0.009 | |
| 9 | 9 | 2.538 | 0.342 | 0.020 | |
| 24 | 9 | 2.778 | 0.375 | 0.022 | |
| Landsberg (n = 727) |
3 | 9 | 0.474 | 0.066 | 0.004 |
| 6 | 8 | 0.676 | 0.094 | 0.006 | |
| 9 | 8 | 1.795 | 0.251 | 0.016 | |
| 24 | 9 | 2.325 | 0.324 | 0.019 |
To further evaluate the validity of the assumption that the population of pollen tube lengths is exponentially distributed, we produced Q-Q plots of the distribution of the maximum pollen tube length from the laboratory experiments performed by Swanson et al. versus the distribution of the maximum from an exponential distribution. The theoretical distribution of the maximum from an exponential distribution was simulated using 10,000 realizations of where for i = 1, …, n, using the values of n and that are listed in Table 1. The Q-Q plots, displayed in Fig 3, show a roughly linear relationship, supporting the assumption that the underlying distribution of pollen tube lengths is exponential. Moreover, we performed a Kolmogorov-Smirnov test of the equality of the empirical and theoretic distributions for each accession of Arabidopsis thaliana at each time point. The smallest resulting p-value was 0.52 (corresponding to the Landsberg accession at 9 hours), further indicating that there is no evidence that the distribution of pollen tube lengths differs significantly from an exponential distribution.
Fig 3. Q-Q plots of the distribution of the maximum pollen tube length from laboratory experiments versus the distribution of the maximum from an exponential distribution.
Acknowledgments
This work was supported by the National Science Foundation under grant IOS-1645508. We would also like to thank Swanson et al. for providing the data for the biological example.
Data Availability
All relevant data are within the manuscript, Supporting Information files, and on the OSF data repository: https://osf.io/zg4ka/.
Funding Statement
A.C. was supported by the National Science Foundation under grant IOS-1645508 (https://www.nsf.gov/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1. David HA, Nagaraja HN. Order statistics. Wiley Online Library; 1970. [Google Scholar]
- 2. Castillo E, Hadi AS, Balakrishnan N, Sarabia JM. Extreme value and related models with applications in engineering and science. Wiley Hoboken, NJ; 2005. [Google Scholar]
- 3. Cramér H. Mathematical methods of statistics. Princeton University Press; 1946. [Google Scholar]
- 4. Mood AM, Graybill FA, Boes DC. Introduction to the theory of statistics. McGraw-Hill; 1974. [Google Scholar]
- 5. Rao CR. Linear statistical inference and its applications, vol 2 Wiley; New York; 1973. [Google Scholar]
- 6. Gumbel EJ. Statistics of extremes. Courier Corporation; 2012. [Google Scholar]
- 7. Swanson RJ, Hammond AT, Carlson AL, Gong H, Donovan TK. Pollen performance traits reveal prezygotic nonrandom mating and interference competition in Arabidopsis thaliana. American Journal of Botany. 2016. February; 103(3):498–513. 10.3732/ajb.1500172 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
All relevant data are within the manuscript, Supporting Information files, and on the OSF data repository: https://osf.io/zg4ka/.



