Using the sample maximum to estimate the parameters of the underlying distribution

Alex Capaldi; Tiffany N Kolba

doi:10.1371/journal.pone.0215529

. 2019 Apr 25;14(4):e0215529. doi: 10.1371/journal.pone.0215529

Using the sample maximum to estimate the parameters of the underlying distribution

Alex Capaldi ¹, Tiffany N Kolba ^1,^*

Editor: Eugene Demidenko²

PMCID: PMC6483189 PMID: 31022209

Abstract

We propose novel estimators for the parameters of an exponential distribution and a normal distribution when the only known information is a sample of sample maxima; i.e., the known information consists of a sample of m values, each of which is the maximum of a sample of n independent random variables drawn from the underlying exponential or normal distribution. We analyze the accuracy and precision of the estimators using extreme value theory, as well as through simulations of the sampling distributions. For the exponential distribution, the estimator of the mean is unbiased and its variance decreases as either m or n increases. Likewise, for the normal distribution, we show that the estimator of the mean has negligible bias and the estimator of the variance is unbiased. While the variance of the estimators for the normal distribution decreases as m, the number of sample maxima, increases, the variance increases as n, the sample size over which the maximum is computed, increases. We apply our method to estimate the mean length of pollen tubes in the flowering plant Arabidopsis thaliana, where the known biological information fits our context of a sample of sample maxima.

1 Introduction

Consider the scenario where one has obtained data where each observation is the maximum value of n independent, identically distributed random variables drawn from either an exponential distribution or a normal distribution with unknown parameters. That is, $X_{i j} \overset{iid}{\sim} E x p (β)$ or $X_{i j} \overset{iid}{\sim} N (μ, σ^{2})$ for i = 1, …, n and known data is drawn from $Y_{j} = max {X_{i j}}_{i = 1}^{n}$ for j = 1, …, m. Here we present a process to estimate the mean β of the underlying exponential distribution or the mean μ and variance σ² of the underlying normal distribution from only the set of Y_j’s.

Much previous research has been conducted in the field of extreme value theory on the distribution of the maximum from a sample of n independent, identically distributed random variables. In particular, the Fisher-Tippett-Gnedenko theorem states that the distribution of the sample maximum, after proper rescaling, can only converge to one of three types of distributions: the Gumbel distribution, the Fréchet distribution, or the Weibull distribution [1]. When the original underlying distribution is exponential or normal, the limiting distribution of the rescaled sample maximum is the Gumbel distribution. Extreme value theory has been applied in many applications, such as estimating the probability of an extreme flood, severe adverse side effect of a drug, maximum environmental load on a structure, or large insurance loss [2]. In these applications, the underlying distribution and its parameters are typically known and the focus is on estimating the probability distribution of the sample maximum. The focus of our scenario is novel in that we are using a known sample of sample maxima to estimate unknown parameters of the underlying distribution.

Standard techniques for estimating unknown parameters, such as method of moments or maximum likelihood estimation, typically assume that the known information consists of a sample of observations drawn directly from the underlying population distribution. However, in our scenario under consideration, the direct observations are unknown. Rather, we only know the maximum value of each sample of direct observations. The estimators we propose in the following sections are the first, to our knowledge, for estimating unknown population parameters when the only known information is a sample of sample maxima.

2 Estimator for exponential distribution

We begin by considering the case where the underlying distribution is exponential with unknown mean β. In Theorem 1 below, we propose an estimator for β and compute its expected value and variance.

Theorem 1. Let $X_{i j} \overset{i i d}{\sim} E x p (β)$ for i = 1, …, n and $Y_{j} = max {X_{i j}}_{i = 1}^{n}$ for j = 1, …, m. Set

\hat{β} = \frac{\bar{Y}}{H_{n}} = \frac{1}{m H_{n}} \sum_{j = 1}^{m} Y_{j} .

(1)

Then

E (\hat{β}) = β, v a r (\hat{β}) = \frac{β^{2} G_{n}}{m H_{n}^{2}}

(2)

where

H_{n} = \sum_{i = 1}^{n} \frac{1}{i}, G_{n} = \sum_{i = 1}^{n} \frac{1}{i^{2}} .

Proof. From the formula for $\hat{β}$ given in Eq (1), it directly follows that

E (\hat{β}) = \frac{E (Y_{j})}{H_{n}}, var (\hat{β}) = \frac{var (Y_{j})}{m H_{n}^{2}} .

Hence, it only remains to compute the expected value and variance of the maximum of a single sample of n independent Exp(β) random variables. Let X_(i) denote the ith smallest observation from such a sample. Then Y_j = X_(n) can be decomposed as the following telescoping sum:

Y_{j} = X_{(n)} = X_{(1)} + (X_{(2)} - X_{(1)}) + \dots + (X_{(n)} - X_{(n - 1)}) .

Due to the memoryless property of the exponential distribution, X₍₂₎ − X₍₁₎ is independent of X₍₁₎. Moreover, while X₍₁₎ is the minimum of n independent Exp(β) random variables, X₍₂₎ − X₍₁₎ can be viewed as the minimum of a sample of n − 1 independent Exp(β) random variables. Likewise, all of the terms in the telescoping sum for Y_j = X_(n) are independent with X_(i+1) − X_(i) equal in distribution to the minimum of a sample of n − i independent Exp(β) random variables, which in turn is equal in distribution to Exp(β/(n − i)). Thus,

E (Y_{j}) = E (X_{(n)}) = \frac{β}{n} + \frac{β}{n - 1} + \dots + \frac{β}{2} + β = β \sum_{i = 1}^{n} \frac{1}{i} = β H_{n}

and

var (Y_{j}) = var (X_{(n)}) = \frac{β^{2}}{n^{2}} + \frac{β^{2}}{{(n - 1)}^{2}} + \dots + \frac{β^{2}}{2^{2}} + β^{2} = β^{2} \sum_{i = 1}^{n} \frac{1}{i^{2}} = β^{2} G_{n} .

Substituting these expressions for E(Y_j) and var(Y_j) into the equations for the expected value and variance of $\hat{β}$ produces the formulas given in Eq (2).

As a consequence of Theorem 1, we have shown that $\hat{β}$ is an unbiased estimator for β and that its variance decreases at a rate proportional to $\frac{1}{m}$ . Since $G_{n} \to \frac{π^{2}}{6}$ and H_n → ∞ at rate log n as n → ∞, the variance of $\hat{β}$ decreases at a rate proportional to $\frac{1}{{(log n)}^{2}}$ . Thus, the precision of the estimator $\hat{β}$ can be improved more rapidly by increasing m, the number of sample maxima, compared to increasing n, the sample size over which the maximum is computed.

3 Estimators for normal distribution

We now consider the case where the underlying distribution is normal with unknown mean μ and unknown variance σ². In Theorem 2 below, we propose estimators for μ and σ² and analyze their expected value, while in Theorem 3 we analyze the variance of the estimators.

Theorem 2. Let $X_{i j} \overset{i i d}{\sim} N (μ, σ^{2})$ for i = 1, …, n and $Y_{j} = max {X_{i j}}_{i = 1}^{n}$ for j = 1, …, m. Let $\bar{Y}$ and $S_{Y}^{2}$ denote the sample mean and sample variance, respectively, of the Y_j’s. Set

\hat{μ} = \bar{Y} - \frac{k_{n}}{\sqrt{c_{n}}} S_{Y}, {\hat{σ}}^{2} = \frac{S_{Y}^{2}}{c_{n}}

(3)

where k_n denotes the mean and c_n denotes the variance of the maximum of n independent, identically distributed N(0, 1) random variables. Then $E ({\hat{σ}}^{2}) = σ^{2}$ , while $E (\hat{μ}) > μ$ with $E (\hat{μ}) \to μ$ as m → ∞.

Proof. The cumulative distribution function of Y_j is given by

\begin{matrix} F_{Y_{j}} (y) & = P (Y_{j} \leq y) = {[P (X_{i j} \leq y)]}^{n} \\ = {[P (\frac{X_{i j} - μ}{σ} \leq \frac{y - μ}{σ})]}^{n} = {[Φ (\frac{y - μ}{σ})]}^{n} . \end{matrix}

Differentiating, the probability density function of Y_j is

f_{Y_{j}} (y) = n {[Φ (\frac{y - μ}{σ})]}^{n - 1} ϕ (\frac{y - μ}{σ}) \frac{1}{σ},

where Φ(z) denotes the cumulative distribution function and ϕ(z) denotes the probability density function of a N(0, 1) random variable. The expected value of Y_j can then be calculated as

\begin{matrix} E (Y_{j}) & = \int_{- \infty}^{\infty} y f_{Y_{j}} (y) d y = \int_{- \infty}^{\infty} y n {[Φ (\frac{y - μ}{σ})]}^{n - 1} ϕ (\frac{y - μ}{σ}) \frac{1}{σ} d y \\ = μ + σ \int_{- \infty}^{\infty} z n {[Φ (z)]}^{n - 1} ϕ (z) d z = μ + σ k_{n} . \end{matrix}

We also obtain that the variance of Y_j is

\begin{matrix} var (Y_{j}) & = \int_{- \infty}^{\infty} {(y - E (Y_{j}))}^{2} f_{Y_{j}} (y) d y \\ = \int_{- \infty}^{\infty} {(y - μ - σ k_{n})}^{2} n {[Φ (\frac{y - μ}{σ})]}^{n - 1} ϕ (\frac{y - μ}{σ}) \frac{1}{σ} d y \\ = σ^{2} \int_{- \infty}^{\infty} {(z - k_{n})}^{2} n {[Φ (z)]}^{n - 1} ϕ (z) d z = σ^{2} c_{n} . \end{matrix}

We can now use the equations for E(Y_j) and var(Y_j) to compute the expected value of the estimators. In particular, the expected value of the estimator of the variance is

E ({\hat{σ}}^{2}) = \frac{E (S_{Y}^{2})}{c_{n}} = \frac{var (Y_{j})}{c_{n}} = \frac{σ^{2} c_{n}}{c_{n}} = σ^{2},

while the estimator of the mean is

E (\hat{μ}) = E (Y_{j}) - \frac{k_{n}}{\sqrt{c_{n}}} E (S_{Y}) > (μ + σ k_{n}) - \frac{k_{n}}{\sqrt{c_{n}}} (σ \sqrt{c_{n}}) = μ .

Although Jensen’s inequality implies that $\hat{μ}$ has positive bias since $E (S_{Y}) < \sqrt{E (S_{Y}^{2})} = \sqrt{var (Y_{j})}$ , the bias of the sample standard deviation goes to zero as the sample size increases. Hence, $E (\hat{μ}) \to μ$ as m → ∞.

Note that the constants k_n and c_n that appear in the formulas for the estimators depend upon only the sample size n. Exact integral expressions exist for k_n and c_n and are given in the proof of Theorem 2, but the integrals cannot be evaluated in closed form. However, the constants can be approximated either analytically or numerically.

In [3], Cramér showed that b_n(Z_(n) − b_n) converges in distribution to the standard Gumbel distribution, where Z_(n) is the maximum of a sample of n independent, identically distributed N(0, 1) random variables and

b_{n} = {(2 log n)}^{\frac{1}{2}} - \frac{\frac{1}{2} log (4 π log n)}{{(2 log n)}^{\frac{1}{2}}} .

Since the standard Gumbel distribution has mean equal to γ ≈ 0.5772, the Euler-Mascheroni constant, and variance equal to $\frac{π^{2}}{6}$ , we can use these values to approximate

k_{n} \approx \frac{γ}{b_{n}} + b_{n}, c_{n} \approx \frac{π^{2}}{6 b_{n}^{2}} .

(4)

Fig 1 displays the values of the analytic approximations for k_n and c_n given in Eq (4) for n ranging from 10 to 100,000, along with corresponding numerical approximations, plotted on a semi-log scale. The numerical approximations for k_n and c_n were computed from 10,000 realizations of a simulation of the maximum.

While Theorem 2 showed that $\hat{μ}$ is positively biased with the bias approaching zero as m → ∞, the estimation bias is fairly minimal even for relatively small values of m. Fig 2 displays the sampling distributions of the estimators for μ and σ² for varying values of m and n. In all the simulations, the sampling distribution of $\hat{μ}$ is fairly centered around the true value of μ = 0. Setting the true value of μ to a nonzero value simply shifts the sampling distribution of $\hat{μ}$ and has no effect on ${\hat{σ}}^{2}$ . We also observe from Fig 2 that the variability of the sampling distributions of the estimators decreases as m, the number of sample maxima, increases, but increases as n, the sample size over which the maximum is computed, increases. We derive an analytical justification for this behavior in Theorem 3 below.

Fig 2 — The horizontal lines indicate the true values of μ and σ².

Theorem 3. Let $X_{i j} \overset{i i d}{\sim} N (μ, σ^{2})$ for i = 1, …, n and $Y_{j} = max {X_{i j}}_{i = 1}^{n}$ for j = 1, …, m. Let $\bar{Y}$ and $S_{Y}^{2}$ denote the sample mean and sample variance, respectively, of the Y_j’s. Set

\hat{μ} = \bar{Y} - \frac{k_{n}}{\sqrt{c_{n}}} S_{Y}, {\hat{σ}}^{2} = \frac{S_{Y}^{2}}{c_{n}}

where k_n denotes the mean and c_n denotes the variance of the maximum of n independent, identically distributed N(0, 1) random variables. Then

v a r ({\hat{σ}}^{2}) = σ^{4} (\frac{2}{m - 1} + \frac{κ}{m})

and

v a r (\hat{μ}) \approx \frac{σ^{2} c_{n}}{m} + \frac{σ^{2} k_{n}^{2}}{4} (\frac{2}{m - 1} + \frac{κ}{m}) - \frac{σ^{2} k_{n} \sqrt{c_{n}} γ_{1}}{m}

where γ₁ is the skewness and κ is the excess kurtosis of the distribution of Y_j.

Proof. The variance of $S_{Y}^{2}$ can be computed as

var (S_{Y}^{2}) = {(var (Y_{j}))}^{2} (\frac{2}{m - 1} + \frac{κ}{m}),

where κ is the excess kurtosis of the distribution of Y_j [4]. Since we computed that var(Y_j) = σ²c_n in Theorem 2, we obtain the desired result

var ({\hat{σ}}^{2}) = var (\frac{S_{Y}^{2}}{c_{n}}) = σ^{4} (\frac{2}{m - 1} + \frac{κ}{m})

for the variance of the variance estimator. Now for the variance of the mean estimator, we compute that

var (\hat{μ}) = var (\bar{Y} - \frac{k_{n}}{\sqrt{c_{n}}} S_{Y}) = var (\bar{Y}) + \frac{k_{n}^{2}}{c_{n}} var (S_{Y}) - 2 \frac{k_{n}}{\sqrt{c_{n}}} cov (\bar{Y}, S_{Y}) .

To simplify the above expression, we use the fact that $var (\bar{Y}) = \frac{var (Y_{j})}{m}$ , along with the approximations

var (S_{Y}) \approx \frac{var (S_{Y}^{2})}{4 var (Y_{j})}, cov (\bar{Y}, S_{Y}) \approx \frac{cov (\bar{Y}, S_{Y}^{2})}{2 \sqrt{var (Y_{j})}} = \frac{var (Y_{j}) γ_{1}}{2 m},

where γ₁ is the skewness of the distribution of Y_j [5]. Using these approximations, we obtain the desired result

var (\hat{μ}) \approx \frac{σ^{2} c_{n}}{m} + \frac{σ^{2} k_{n}^{2}}{4} (\frac{2}{m - 1} + \frac{κ}{m}) - \frac{σ^{2} k_{n} \sqrt{c_{n}} γ_{1}}{m}

for the variance of the mean estimator.

As n increases, the value of κ increases monotonically from 0, the excess kurtosis of the normal distribution, to 12/5, the excess kurtosis of the Gumbel distribution [1]. Thus, from Theorem 3, we observe that the variance of the sampling distribution of ${\hat{σ}}^{2}$ increases proportionally to σ⁴ as σ increases and decreases proportionally to 1/m as m increases, but only slightly increases as n increases.

As with the excess kurtosis, the skewness of the distribution of Y_j increases monotonically from the value for the normal distribution, i.e., γ₁ = 0, to the value for the standard Gumbel distribution, i.e., γ₁ ≈ 1.13955, as n increases [6]. Since the constant c_n decreases towards 0 while the constant k_n increases towards infinity, the dominant term in the variance of $\hat{μ}$ increases proportionally to $k_{n}^{2}$ as n increases. We also observe from Theorem 3 that the variance of the sampling distribution of $\hat{μ}$ increases proportionally to σ² as σ increases and decreases proportionally to 1/m as m increases. These relationships explain the behavior of the sampling distributions displayed in Fig 2.

4 Biological application

During fertilization in flowering plants, once pollen land on the stigma, the pollen will grow tubes that travel down through a transmitting tract from the stigma toward an ovule. Pollen compete against each other in a race towards the limited number of ovules to determine which pollen will father the seeds. The mean length of the population of pollen tubes at various time points is of interest to plant biologists, yet, to date, there are only measures of the lengths of the longest pollen tubes in such competitions [7]. Since the pollen tube lengths must have a positive value, it is reasonable to assume that the lengths follow an exponential distribution. Hence, our method described in Section 2 will allow the mean pollen tube length to be estimated given the structure of the experimental data.

In [7], Swanson et al. measured the longest pollen tube lengths at four time points for two accessions (i.e., specific geographical populations) of Arabidopsis thaliana in a laboratory setting. For both the Columbia and Landsberg accessions, either m = 8 or m = 9 individual plants were used for each time point. The average number of pollen tubes within each plant was n = 933 for the Columbia accession and n = 727 for the Landsberg accession. Table 1 reports the sample mean of the longest pollen tube from the m plants for each accession after 3, 6, 9, and 24 hours. Using these sample means of the longest lengths and Eq (2) from Theorem 1, we then estimated the overall mean length for each accession at each time point. The resulting estimates and their standard errors are listed in Table 1.

Table 1. The sample mean, $\bar{y}$ , of the longest pollen tube from m plants, where each plant contained an average of n pollen tubes, along with the corresponding estimated mean pollen length, $\hat{β}$ , and its standard error, for two accessions of Arabidopsis thaliana at varying time points.

Accession	time (hrs)	m	$\bar{y}$ (mm)	$\hat{β}$ (mm)	${SE}_{\hat{β}}$ (mm)
Columbia (n = 933)	3	9	0.690	0.093	0.005
	6	8	1.069	0.144	0.009
	9	9	2.538	0.342	0.020
	24	9	2.778	0.375	0.022
Landsberg (n = 727)	3	9	0.474	0.066	0.004
	6	8	0.676	0.094	0.006
	9	8	1.795	0.251	0.016
	24	9	2.325	0.324	0.019

Open in a new tab

To further evaluate the validity of the assumption that the population of pollen tube lengths is exponentially distributed, we produced Q-Q plots of the distribution of the maximum pollen tube length from the laboratory experiments performed by Swanson et al. versus the distribution of the maximum from an exponential distribution. The theoretical distribution of the maximum from an exponential distribution was simulated using 10,000 realizations of $Y_{j} = max {X_{i j}}_{i = 1}^{n}$ where $X_{i j} \overset{iid}{\sim} E x p (\hat{β})$ for i = 1, …, n, using the values of n and $\hat{β}$ that are listed in Table 1. The Q-Q plots, displayed in Fig 3, show a roughly linear relationship, supporting the assumption that the underlying distribution of pollen tube lengths is exponential. Moreover, we performed a Kolmogorov-Smirnov test of the equality of the empirical and theoretic distributions for each accession of Arabidopsis thaliana at each time point. The smallest resulting p-value was 0.52 (corresponding to the Landsberg accession at 9 hours), further indicating that there is no evidence that the distribution of pollen tube lengths differs significantly from an exponential distribution.

Acknowledgments

This work was supported by the National Science Foundation under grant IOS-1645508. We would also like to thank Swanson et al. for providing the data for the biological example.

Data Availability

All relevant data are within the manuscript, Supporting Information files, and on the OSF data repository: https://osf.io/zg4ka/.

Funding Statement

A.C. was supported by the National Science Foundation under grant IOS-1645508 (https://www.nsf.gov/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. David HA, Nagaraja HN. Order statistics. Wiley Online Library; 1970. [Google Scholar]
2. Castillo E, Hadi AS, Balakrishnan N, Sarabia JM. Extreme value and related models with applications in engineering and science. Wiley Hoboken, NJ; 2005. [Google Scholar]
3. Cramér H. Mathematical methods of statistics. Princeton University Press; 1946. [Google Scholar]
4. Mood AM, Graybill FA, Boes DC. Introduction to the theory of statistics. McGraw-Hill; 1974. [Google Scholar]
5. Rao CR. Linear statistical inference and its applications, vol 2 Wiley; New York; 1973. [Google Scholar]
6. Gumbel EJ. Statistics of extremes. Courier Corporation; 2012. [Google Scholar]
7. Swanson RJ, Hammond AT, Carlson AL, Gong H, Donovan TK. Pollen performance traits reveal prezygotic nonrandom mating and interference competition in Arabidopsis thaliana. American Journal of Botany. 2016. February; 103(3):498–513. 10.3732/ajb.1500172 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All relevant data are within the manuscript, Supporting Information files, and on the OSF data repository: https://osf.io/zg4ka/.

[pone.0215529.ref001] 1. David HA, Nagaraja HN. Order statistics. Wiley Online Library; 1970. [Google Scholar]

[pone.0215529.ref002] 2. Castillo E, Hadi AS, Balakrishnan N, Sarabia JM. Extreme value and related models with applications in engineering and science. Wiley Hoboken, NJ; 2005. [Google Scholar]

[pone.0215529.ref003] 3. Cramér H. Mathematical methods of statistics. Princeton University Press; 1946. [Google Scholar]

[pone.0215529.ref004] 4. Mood AM, Graybill FA, Boes DC. Introduction to the theory of statistics. McGraw-Hill; 1974. [Google Scholar]

[pone.0215529.ref005] 5. Rao CR. Linear statistical inference and its applications, vol 2 Wiley; New York; 1973. [Google Scholar]

[pone.0215529.ref006] 6. Gumbel EJ. Statistics of extremes. Courier Corporation; 2012. [Google Scholar]

[pone.0215529.ref007] 7. Swanson RJ, Hammond AT, Carlson AL, Gong H, Donovan TK. Pollen performance traits reveal prezygotic nonrandom mating and interference competition in Arabidopsis thaliana. American Journal of Botany. 2016. February; 103(3):498–513. 10.3732/ajb.1500172 [DOI] [PubMed] [Google Scholar]

PERMALINK

Using the sample maximum to estimate the parameters of the underlying distribution

Alex Capaldi

Tiffany N Kolba

Roles

Abstract

1 Introduction

2 Estimator for exponential distribution

3 Estimators for normal distribution

Fig 1. From the top curve to the bottom, the plot displays the values of the analytic approximation for k_n (solid) and the numerical approximation for k_n (dashed), along with the analytic approximation for c_n (dotted) and the numerical approximation for c_n (dot-dashed).

Fig 2. Estimates of $\hat{μ}$ (top) and ${\hat{σ}}^{2}$ (bottom) from 100 realizations with n, m = 10, 100, and 1000.

4 Biological application

Table 1. The sample mean, $\bar{y}$ , of the longest pollen tube from m plants, where each plant contained an average of n pollen tubes, along with the corresponding estimated mean pollen length, $\hat{β}$ , and its standard error, for two accessions of Arabidopsis thaliana at varying time points.

Fig 3. Q-Q plots of the distribution of the maximum pollen tube length from laboratory experiments versus the distribution of the maximum from an exponential distribution.

Acknowledgments

Data Availability

Funding Statement

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Using the sample maximum to estimate the parameters of the underlying distribution

Alex Capaldi

Tiffany N Kolba

Roles

Abstract

1 Introduction

2 Estimator for exponential distribution

3 Estimators for normal distribution

Fig 1. From the top curve to the bottom, the plot displays the values of the analytic approximation for kn (solid) and the numerical approximation for kn (dashed), along with the analytic approximation for cn (dotted) and the numerical approximation for cn (dot-dashed).

Fig 2. Estimates of μ^ (top) and σ^2 (bottom) from 100 realizations with n, m = 10, 100, and 1000.

4 Biological application

Table 1. The sample mean, y¯, of the longest pollen tube from m plants, where each plant contained an average of n pollen tubes, along with the corresponding estimated mean pollen length, β^, and its standard error, for two accessions of Arabidopsis thaliana at varying time points.

Fig 3. Q-Q plots of the distribution of the maximum pollen tube length from laboratory experiments versus the distribution of the maximum from an exponential distribution.

Acknowledgments

Data Availability

Funding Statement

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Fig 1. From the top curve to the bottom, the plot displays the values of the analytic approximation for k_n (solid) and the numerical approximation for k_n (dashed), along with the analytic approximation for c_n (dotted) and the numerical approximation for c_n (dot-dashed).

Fig 2. Estimates of $\hat{μ}$ (top) and ${\hat{σ}}^{2}$ (bottom) from 100 realizations with n, m = 10, 100, and 1000.

Table 1. The sample mean, $\bar{y}$ , of the longest pollen tube from m plants, where each plant contained an average of n pollen tubes, along with the corresponding estimated mean pollen length, $\hat{β}$ , and its standard error, for two accessions of Arabidopsis thaliana at varying time points.