Skip to main content
Oxford University Press logoLink to Oxford University Press
. 2016 May 6;103(2):273–286. doi: 10.1093/biomet/asw011

Nonparametric maximum likelihood estimation for the multisample Wicksell corpuscle problem

Kwun Chuen Gary Chan 1,, Jing Qin 2
PMCID: PMC4890128  PMID: 27279657

Abstract

We study nonparametric maximum likelihood estimation for the distribution of spherical radii using samples containing a mixture of one-dimensional, two-dimensional biased and three-dimensional unbiased observations. Since direct maximization of the likelihood function is intractable, we propose an expectation-maximization algorithm for implementing the estimator, which handles an indirect measurement problem and a sampling bias problem separately in the E- and M-steps, and circumvents the need to solve an Abel-type integral equation, which creates numerical instability in the one-sample problem. Extensions to ellipsoids are studied and connections to multiplicative censoring are discussed.

Keywords: Abel-type integral equation, Expectation-maximization algorithm, Indirect measurement, Particle size

1. Introduction

Wicksell (1925) studied the estimation of the size of spherical corpuscles found in anatomical samples of tissues in organs such as the spleen, thymus and pancreas. The samples were prepared by thin cross-sections of tissues and circular profiles of the corpuscles were observed. Wicksell identified two statistical challenges: an indirect measurement problem and a sampling bias problem, which will be described in detail in §2. He proposed a mathematical formulation of the problem where the distribution of spherical radii of interest and the distribution of two-dimensional observations are related through an Abel-type integral equation. However, the problem is considered to be ill-posed because a naive estimator defined by plugging in the empirical distribution of observations is often badly behaved, as illustrated in Fig.1a.

Fig. 1.

Fig. 1.

Comparing a naive plug-in estimator (a) and the proposed estimator (b) of the distribution function of spherical radii based on two-dimensional observations with Inline graphic. The sample is generated from a standard uniform distribution.

The Wicksell corpuscle problem has numerous applications because there are many practical situations that involve indirect measurement of three-dimensional objects. Although recent technological advancements have opened the possibility of direct three-dimensional measurements, indirect measurements from lower dimensions are still widely used for practical reasons. A comparison of radii distribution of particles from two-dimensional and three-dimensional measurements of nickel-based superalloy Inconel 100 was considered in Tucker et al. (2012). Using different technologies, it is now possible to combine observations from different dimensions to improve estimation accuracy.

The statistical literature on one-sample estimation of the corpuscle problem is vast; see Chiu et al. (2013) for a comprehensive review. It mainly considers one-sample estimation with two-dimensional observations, and many statistical and numerical procedures have been developed to overcome the ill-posed nature of the problem. For example, Hall & Smith (1988), Van Es & Hoogendoorn (1990) and Golubev & Levit (1998) considered kernel smoothing methods, Nychka et al. (1984) studied a spline-based method, Antoniadis et al. (2001) proposed a wavelet method, and Groeneboom & Jongbloed (1995) considered an isotonized estimator. Many other statistical and numerical methods are surveyed in Chiu et al. (2013), who comment that no method has clear advantages. While each method has distinctive merits in one-sample estimation, those methods typically regularize estimators obtained from an Abel-type integral equation, and such a representation is not readily extended to multiple samples.

We study maximum likelihood estimation in the multisample problem, where observations are collected from a combination of one-dimensional, two-dimensional and three-dimensional observations. Direct maximization of the likelihood function is intractable, and we will present an EM algorithm to combine three-sample observations, with special cases yielding new two-sample and one-sample estimators. In one-sample Wicksell problems, Vardi et al. (1985), Wilson (1989) and Silverman et al. (1990) developed EM-type algorithms, but their approaches differ from the proposed method in key aspects. First, all three papers considered a categorical data model, discretizing the domain of interest into finite number of bins and applying the EM algorithm for categorical data (Little & Rubin, 2002, Ch. 13). To reduce the arbitrariness of binning, they considered smoothing after each iteration. In contrast, our method does not require binning and smoothing, estimating the distribution function directly. Also, our method does not require numerical inversion of a discretized Abel-type integral equation. As a by-product, the algorithm also ensures that the resulting nonparametric maximum likelihood estimator is monotone, as shown in Fig.1b.

2. Notations and problem formulation

Suppose that spherical particles of different radii are randomly distributed in Inline graphic, where the centres of the spheres are distributed according to a stationary spatial Poisson process. While the Poisson process assumption is imposed in most statistical papers, it can be relaxed, as discussed in §3. We are interested in estimating the distribution function of the radii of the particles, which is assumed to have a finite second moment and a density function Inline graphic. Suppose we sample the spheres using a two-dimensional planar sample, and we observe only the circular profiles of the spheres intersecting the plane. Let Inline graphic be the radii of the two-dimensional circular profiles. Two statistical challenges are present. First, the spheres with larger radii are more likely to be sampled, and the spheres are sampled with probability proportional to their radii. Therefore, the sampling distribution of the spherical radii is

2.

Second, the radii Inline graphic of the two-dimensional circular profiles are indirect measurements and are always smaller than the radii of the spheres being sampled. Given that a sphere with radius Inline graphic is being sampled, Wicksell (1925) showed that

2.

Therefore, the sampling density of Inline graphic is given by

2. (1)

Suppose, in addition, that we sample using a one-dimensional linear probe. Let Inline graphic be the half length of the traces where the linear probe intersects the spheres. The spheres are now sampled proportional to their squared radii,

2.

and the half-lengths of the measurements are again always smaller than the radii of the spheres. Given that a sphere with radius Inline graphic is sampled, Watson (1971) showed that the sampling distribution of Inline graphic given Inline graphic is

2.

so the sampling distribution of Inline graphic is

2.

We are interested in estimating Inline graphic, the population distribution of Inline graphic. Suppose we have an unbiased three-dimensional random sample of independent observations Inline graphic, an independent two-dimensional cross-sectional sample Inline graphic, and a one-dimensional linear probe sample Inline graphic. The total sample size is Inline graphic. The likelihood function is

2. (2)

While direct maximization of (2) appears to be intractable, we capitalize on a subtle difference between the indirect measurement and the sampling bias problems to derive an EM algorithm for nonparametric maximum likelihood estimation.

3. Maximum likelihood estimation and the EM algorithm

To shed light on how to perform maximum likelihood estimation for this problem, we first suppose we can observe the radii Inline graphic of the spheres being sampled by the two-dimensional planar probe, and the radii Inline graphic of the spheres being sampled by the one-dimensional linear probe. Based on the hypothetical complete observations of radii Inline graphic for the three samples, the complete data likelihood,

3. (3)

is a biased-sample likelihood function, as in Vardi (1985), who also showed that the nonparametric maximum likelihood estimator of Inline graphic based on (3) assigns point mass only to distinct data points of the combined samples. Let Inline graphic be the observed distinct data points, Inline graphic, and let Inline graphic, Inline graphic and Inline graphic denote the multiplicities of the three-dimensional sample, two-dimensional sample and one-dimensional sample at Inline graphic respectively, i.e., Inline graphic, Inline graphic and Inline graphic. Then the nonparametric maximum likelihood estimator based on the complete Inline graphic, Inline graphic and Inline graphic can be found by maximizing

3. (4)

subject to the constraints

3. (5)

Therefore, if the radii Inline graphic are observed in all samples, the estimation for Inline graphic reduces to that in Vardi (1985) for biased sampling problems, for which a computationally efficient algorithm proposed by Mallows (1985) was shown to converge to the nonparametric maximum likelihood estimate by Davidov & Iliopoulos (2010). However, we cannot simply proceed by using the above method for the multisample corpuscle problem, since Inline graphic and Inline graphic are unknown due to the indirect measurement. Noting that

3.

the loglikelihood based on (4) is linear in Inline graphic and Inline graphic, we can proceed with an EM algorithm:

Algorithm 1. —

EM algorithm for computing the spherical radii distribution.

Step 1. —

Initialize Inline graphic.

Step 2. —

For Inline graphic,

(E-step) Calculate

graphic file with name M54.gif

(M-step) To maximize (4) subject to (5) with Inline graphic replaced by Inline graphic and Inline graphic given in the E-step. The corresponding nonparametric maximum likelihood estimator of Vardi (1989) is

graphic file with name M58.gif

where Inline graphic and Inline graphic satisfy the equations

graphic file with name M61.gif

Step 3. —

Repeat Step 2 until a convergence criterion is met. We denote the final estimate by Inline graphic. The nonparametric maximum likelihood estimate of Inline graphic is

graphic file with name M64.gif

In the Appendix, we show that the loglikelihood function is strictly concave, and thus has a unique maximizer Inline graphic. The estimate based on the EM algorithm converges to the nonparametric maximum likelihood estimate Inline graphic, following Csiszar & Tusnàdy (1984), since the set of all probability measures over which the likelihood is maximized is convex.

While the Poisson assumption was employed in the original derivation of Wicksell (1925) and has been assumed in most subsequent statistical papers, the core of Wicksell's solution, the Abel-type integral (1) which represents the combination of both the measurement and the sampling bias problems, can be developed in more general settings. Mecke & Stoyan (1980) showed that (1) can be derived when the centres of the spheres follow stationary point processes. This allows the case where overlapping spheres are removed (Bartlett, 1974) and the remaining spheres are weakly correlated. Jensen (1984) showed that (1) can be derived when the non-overlapping spheres are assumed to be deterministic, but the location of the planar probe is random. Under these assumptions, the measurement and the sampling bias problems remain the same as in Wicksell's original problem, so the proposed method still maximizes the likelihood function (2), which is an independence likelihood (Lindsay, 1988) when the spheres are correlated. In one-sample problems, Heinrich (2007) showed that certain maximum independence likelihood estimators are asymptotically normal when the spheres are weakly correlated. Simulations were conducted to evaluate the proposed method when the Poisson assumption is violated; see §6.1.

Remark 1. —

Spherical assumptions are common in stereology. The sensitivity of the deviation from the random spheres approximation was examined in Anderssen & Jakeman (1974), who concluded that the approximation is quite reliable. For particles with random shape, there is no general relationship relating to the size distribution of three-dimensional particles and two-dimension sections. Our method can be generalized to some particular cases, such as ellipsoids, where an explicit relationship between the sizes of three-dimensional particles and two-dimensional sections is available.

4. Extension to ellipsoids

Although widely studied, the original Wicksell problem only considers univariate size distributions with a constant spherical shape. For particles with variation in both shape and size, it is desirable to estimate the joint distribution of a size and a shape measure. A useful model is the ellipsoid model, which has been studied since Wicksell (1926) under independence between size and shape, while Cruz-Orive (1976) studied a general mathematical formulation. Although the problem is much more complicated than the spherical case, two-dimensional indirect observations of three-dimensional ellipsoids are subject to the same statistical problems, so the proposed EM algorithm for spheres can be extended to ellipsoids. To illustrate the ideas, we first consider the case where we only have two-dimensional indirect observations. As discussed in Cruz-Orive (1976), one cannot nonparametrically identify the joint distribution of axes of a triaxial ellipsoid. Instead, one can only identify the joint distribution of the major semiaxes Inline graphic and the minor semiaxes Inline graphic of a biaxial ellipsoid, which could be prolate or oblate. For simplicity we consider prolate ellipsoids; the derivation is very similar for oblate ellipsoids. We adopt the reparameterization of Cruz-Orive (1976) and consider estimation of the joint distribution of Inline graphic and the eccentricity parameter Inline graphic. The joint density and distribution of Inline graphic are denoted by Inline graphic and Inline graphic. The two-dimensional observed ellipses are Inline graphic.

Let Inline graphic. If we can observe the ellipsoids that intersect the probe, Cruz-Orive (1976) showed that the sampling distribution of Inline graphic is

4. (6)

Given an ellipsoid with minor semiaxis Inline graphic and eccentricity Inline graphic that intersects the two-dimensional probe, the observed Inline graphic is subject to indirect measurement such that Inline graphic and Inline graphic, and

4.

Therefore, the sampling density of Inline graphic is

4.

where Inline graphic.

In an EM algorithm, the E-step can be constructed by considering the sampling conditional density of Inline graphic given the observed Inline graphic,

4. (7)

and the M-step can be constructed by (6). To maximize the likelihood function over the observed data points, let Inline graphicInline graphic. The EM algorithm is stated as follows.

Algorithm 2. —

EM algorithm for ellipsoids.

Step 1. —

Initialize Inline graphic.

Step 2. —

For Inline graphic,

(E-step) Following (7), the E-step is

graphic file with name M94.gif

(M-step) Following (6), the M-step is

graphic file with name M95.gif

Step 3. —

Repeat Step 2 until a convergence criterion is met. We denote the final estimate by Inline graphic. The corresponding estimate of Inline graphic is

graphic file with name M98.gif

Remark 2. —

We are often interested in estimating the distribution function of certain summary variables, such as axial ratio or volume. In general, these are functions of Inline graphic and Inline graphic. Since the joint distribution of Inline graphic can be estimated, the distribution of functions of Inline graphic can also be estimated. For example, let Inline graphic be the ratio of the minor axis to the major axis. The distribution function of Inline graphic can be estimated by

graphic file with name M105.gif

For another example, the distribution function of Inline graphic, the volume of an ellipse, can be estimated by

graphic file with name M107.gif

where Inline graphic, Inline graphic and Inline graphic. McGarrity et al. (2014) considered the estimation of univariate summary measures that are functions of radii and heights of cylinders. Their method is specially designed for the case where the height of the cylinders does not suffer from an indirect measurement problem, and cannot be directly extended to estimate the axial ratio distribution or the volume distribution of ellipsoids where a bivariate measurement problem is present.

Remark 3. —

As shown in Cruz-Orive (1976), a two-dimensional sample cannot identify the joint trivariate distribution of a principal semiaxis and two principal eccentricities from a sample which is a mixture of the prolate and oblate spheroids. The difficulty is primarily due to the non-identifiability of mixture proportions. Using an unbiased three-dimensional sample, however, allows us to identify the proportion of prolate and oblate spheroids in the samples. When the estimated proportion is treated as fixed, we can modify the E-step to distribute a fraction of masses to prolate and oblate spheroids respectively, and the M-step to reweight a mixture of biased samples.

5. Connections to multiplicative censoring

Vardi (1989) and Vardi & Zhang (1992) considered the following multiplicative censoring problem. They assumed that two independent samples are available: Inline graphic are independent and identically distributed complete uncensored observations with density Inline graphic, and Inline graphic are incomplete observations where Inline graphic are independent standard uniform distributed and independent of Inline graphic, which have density Inline graphic. The incomplete observations are random fractions of the complete observations. Vardi (1989) showed that this multiplicative censoring structure is present in three unrelated applications: nonparametric estimation in renewal processes, deconvolution, and estimation of a monotonic decreasing density. The likelihood function for this problem is

5.

Since given Inline graphic, Inline graphic is Inline graphic distributed, an EM algorithm was proposed in Vardi (1989) with an E-step assigning weights according to the conditional density of Inline graphic given Inline graphic, which is Inline graphic, for Inline graphic.

The multiple corpuscle problem can be viewed as the following multiplicative censoring problem, with independent three-dimensional observations Inline graphic, two-dimensional observations Inline graphic and one-dimensional observations Inline graphic, where Inline graphic and Inline graphic are equally distributed, Inline graphic and Inline graphic are equally distributed, Inline graphic is uniform distributed, the density function of Inline graphic is Inline graphic and the density function of Inline graphic is Inline graphic. It can be shown that

5.

and hence Inline graphic. Also,

5.

and therefore Inline graphic. Thus, the likelihood function of this multiplicative censoring problem is equivalent to (2).

The connection to the multiplicative censoring problem of Vardi (1989) explains the similarities as well as the differences between the proposed EM algorithm and that of Vardi (1989). In comparison to Vardi's E-step, which assigns weights to the censored observations according to the conditional density of Inline graphic given Inline graphic, the proposed algorithm assigns weights to the two-dimensional observations based on the conditional density of Inline graphic given Inline graphic, and to the one-dimensional observations based on the conditional density of Inline graphic given Inline graphic. The multiplicative censoring formulation also explains the difference in the M-steps, since in Vardi (1989)Inline graphic and Inline graphic are identically distributed, but Inline graphic and Inline graphic are biased versions of Inline graphic.

6. Numerical examples

6.1. Simulations

We conducted numerical studies to examine the finite sample properties of the estimators proposed in Inline graphic3. For each simulation scenario, 5000 independent datasets were generated. In the first set of simulations, each dataset consists of independent observations from three-dimensional, two-dimensional and one-dimensional samples. We present the results when Inline graphic was generated from a uniform distribution. We performed additional simulations for beta-distributed Inline graphic; the results were qualitatively similar and are not presented. Table 1 shows the performance of the proposed three-sample estimator for different sample sizes. In general, the proposed estimator had a negligible small-sample bias at a wide range of percentile points along the distribution of Inline graphic. Also, the sampling bias decreased with sample sizes, supporting the consistency of the estimator. The sampling variability of the estimator decreased with an increase in sample size. We considered unequal sample sizes among the samples in case (e), which had the same total number of observations as in case (b), but had the same total number of three-dimensional observations as in case (a). Since three-dimensional observations were most informative about the three-dimensional radii distribution, the sample variability of (e) lay between (a) and (b). Comparing (a) and (e), the additional two-dimensional and one-dimensional observations were more informative to the upper tail of the distribution because larger objects are more likely to be sampled due to sampling bias.

Table 1.

Pointwise performance of the proposed 3-sample estimator for various sample sizes

(a) (b) (c) (d) (e)
Percentile Bias SD Bias SD Bias SD Bias SD Bias SD
10 Inline graphic1 42 1 30 Inline graphic1 20 Inline graphic1 14 Inline graphic1 41
30 Inline graphic2 62 Inline graphic1 43 1 28 Inline graphic1 20 Inline graphic1 56
50 Inline graphic2 63 Inline graphic1 42 Inline graphic1 30 Inline graphic1 22 Inline graphic1 52
70 Inline graphic2 53 Inline graphic1 35 Inline graphic1 25 Inline graphic1 18 1 43
90 Inline graphic1 33 Inline graphic1 23 Inline graphic1 16 Inline graphic1 10 Inline graphic1 25
95 Inline graphic1 24 Inline graphic1 17 Inline graphic1 12 Inline graphic1 8 Inline graphic1 20
97Inline graphic5 Inline graphic1 20 Inline graphic1 13 Inline graphic1 9 Inline graphic1 6 1 17

The values of Bias and SD were multiplied by 1000, and SD represents the sampling standard deviation. (a) Inline graphic, (b) Inline graphic, (c) Inline graphic, (d) Inline graphic, (e) Inline graphic.

Table 2 compares estimators using observations from different samples. We compared the three-sample estimator given in §3, with two-sample and one-sample estimators which are special cases of the proposed method, with the empirical distribution function using only three-dimensional observations. Compared to the estimator using observations from three-dimensional data only, inclusion of additional samples improved estimation efficiencies while the small sample bias was unaffected. The small sample bias and variability both increased when information from three-dimensional data were not used.

Table 2.

Pointwise performance of the estimators using observations from different samples

Samples (1,2,3)
(2,3)
(3)
(1,2)
(2)
Percentile Bias SD Bias SD Bias SD Bias SD Bias SD
10 Inline graphic1 20 1 21 Inline graphic1 21 2 73 -16 76
30 1 28 Inline graphic1 31 Inline graphic2 34 16 79 1 83
50 Inline graphic1 30 Inline graphic1 32 Inline graphic2 37 29 61 8 67
70 Inline graphic1 25 1 26 Inline graphic1 32 22 42 9 50
90 Inline graphic1 16 Inline graphic1 17 Inline graphic1 22 12 23 6 28
95 Inline graphic1 12 Inline graphic1 13 Inline graphic1 15 8 18 4 22
97Inline graphic5 Inline graphic1 9 Inline graphic1 10 Inline graphic1 11 6 14 3 17

The sample sizes were Inline graphic. The values of Bias and SD were multiplied by 1000, and SD represents the sampling standard deviation. Numbers in the brackets indicate the dimensions of observations being included.

Next, we study the performance of the estimators when the Poisson process assumption is violated. To generate a weakly correlated population of spheres, we follow a model of Bartlett (1974), where spheres are sequentially generated and a sphere is removed if it overlaps with any existing spheres. The resulting process is a stationary marked point process and the spheres are weakly dependent. We study the performance of the estimators by varying the truncation fraction, that is, the fraction of spheres that are removed. Table 3 shows that estimation bias increases slightly but the sampling variability remains similar when the degree of overlapping increases.

Table 3.

Pointwise performance of the estimators when the Poisson process assumption is violated

Inline graphic Inline graphic Inline graphic Inline graphic
Percentile Bias SD Bias SD Bias SD Bias SD
10 Inline graphic1 42 Inline graphic2 30 Inline graphic1 32 Inline graphic2 33
30 1 43 Inline graphic4 42 Inline graphic9 43 Inline graphic11 43
50 Inline graphic1 42 Inline graphic9 42 Inline graphic15 41 Inline graphic19 41
70 Inline graphic1 36 Inline graphic10 36 Inline graphic16 35 Inline graphic20 33
90 Inline graphic1 23 Inline graphic5 23 Inline graphic9 23 Inline graphic11 21

The sample sizes were Inline graphic. The values of Bias and SD were multiplied by 1000, SD represents the sampling standard deviation, and Inline graphic is the fraction of spheres that are removed due to overlapping.

We conducted further simulations to evaluate the rate of convergence of the estimators under the Poisson process assumption. We compared two scenarios: when only two-dimensional observations are available, and when 10Inline graphic of the observations are three-dimensional and 90Inline graphic are two-dimensional. According to Groeneboom & Jongbloed (1995), the minimax rate of convergence for two-dimensional observations is Inline graphic, so Inline graphic times the sampling standard derivation Inline graphic of the estimates Inline graphic should stabilize as Inline graphic increases. That is, Inline graphic for some constant Inline graphic. We reran the simulations with Inline graphic and 5000 and evaluated the sampling standard deviation of the estimated distribution function at the 10, 30, 50, 70 and 90 percentiles of the true distribution. We fitted a linear model with outcome Inline graphic, predictor Inline graphic and dummy variables for different percentiles. The estimated regression coefficient of Inline graphic from the simulations was Inline graphic, with Inline graphic confidence interval Inline graphic, consistent with the theoretical predictions. For a combination of three-dimensional and two-dimensional observations, we conjecture that the rate of convergence is Inline graphic, which corresponds to a true regression coefficient of 0. Using Inline graphic three-dimensional and 90Inline graphic two-dimensional data, the estimated regression coefficient of Inline graphic from the simulations was Inline graphic, with Inline graphic confidence interval Inline graphic, so we cannot reject the null hypothesis that the rate of convergence is Inline graphic. We found that the rates of convergence were different in the two scenarios even when the second sample was dominated by two-dimensional observations.

6.2. Data analysis

We applied the proposed method to estimate the diameter distribution of a nickel-based superalloy Inconel 100, using combined three-dimensional, two-dimensional and one-dimensional data derived from Tucker et al. (2012). The two-dimensional and one-dimensional observations were obtained by planar and linear sections of the sample materials, and the three samples contain non-overlapping particles. The data contain 84 particles from four one-dimensional sections and 254 particles from a two-dimensional section. Since three-dimensional measurements are more costly to obtain, we included a sample of 120 particles from the three-dimensional observations for illustration.

The estimates of the cumulative distribution are shown in Fig. 2. Figure 2(a) shows the estimates for the full support of the diameter distribution. An empirical distribution from the two-dimensional sample overestimated the proportion of particles with small diameter, but not the upper tail of the diameter distribution. Figure 2(b) zooms into the lower tail of the distribution, where the empirical distribution of a two-dimensional sample is biased. The nonparametric maximum likelihood estimate based on the combined samples was nearly identical to the empirical distribution based on the three-dimensional observations for diameter less than Inline graphicm and cumulative probability less than 10%. This pattern is very similar to that seen in Table 2. As shown in Table 2, the efficiency gain of the combined sample estimator is more noticeable in the percentile range greater than 10%, and we observe that the estimated cumulative distribution deviates slightly compared to the empirical distribution based on the three-dimensional observations. Based on 1000 bootstrap replicates, the estimated standard errors for estimating the proportion of particles with diameter less than Inline graphicm were Inline graphic and Inline graphic for the combined sample estimate and the three-dimensional sample estimate respectively.

Fig. 2.

Fig. 2.

Distribution function estimates of the diameter of Inconel 100 particles. (a) Full support, (b) the lower tail of the distribution. Solid lines represent the proposed estimator, dashed lines represent the empirical distribution based on the two-dimensional sample, dotted lines represent the empirical distribution based on the three-dimensional sample.

7. Concluding remarks

The corpuscle problem also arises for objects in other than three dimensions. The mathematical formulation of the Inline graphic-dimensional corpuscle problem can be found in Heinrich (2007). For any Inline graphic, the relationships between Inline graphic, Inline graphic and Inline graphic dimensional observations remain the same. Therefore, our method can be applied to the general Inline graphic-dimensional problem with a combination of Inline graphic, Inline graphic and Inline graphic dimensional observations. Nonetheless, Inline graphic covers most applications of practical interest.

We have shown that the proposed EM algorithm converges to the nonparametric maximum likelihood estimate. We will study its large sample-properties in future research: the single-sample estimators based on three-dimensional, two-dimensional and one-dimensional observations have rates of convergence Inline graphic, Inline graphic and Inline graphic respectively, as given in unpublished 1999 Vrije Universiteit lecture notes by G. Jongbloed. Vardi & Zhang (1992) studied the large-sample properties of a nonparametric maximum likelihood estimator proposed in Vardi (1989) for the two-sample multiplicative censoring problem discussed in §5, in which one-sample observations have different convergence rates of Inline graphic and Inline graphic. Their two-sample estimator results in a Inline graphic convergence rate and is asymptotically more efficient than the one-sample estimator with Inline graphic rate of convergence. Our setting is substantially different, since their problem did not involve multisample sampling bias. As a result, the M-step of Vardi's algorithm only involves an empirical distribution, whereas the M-step of our algorithm involves nonparametric estimation under multiple biased samples. We expect that the proof of theoretical results for the proposed method will be more difficult than in Vardi & Zhang (1992), but based on their theoretical results and our simulation results, we conjecture that our three-sample estimator has a Inline graphic rate of convergence when the sampling fraction Inline graphic converges to a nonnegligible proportion Inline graphic.

Acknowledgement

The authors thank the editor, an associate editor and three reviewers for their helpful comments and suggestions. They also thank Joe Tucker and Anthony Rollett for providing the Inconel 100 data. The first author is partially supported by the National Heart, Lung, and Blood Institute of the U.S. National Institutes of Health.

Appendix. Concavity of the loglikelihood function

Let Inline graphic, Inline graphic, and Inline graphic, Inline graphic and Inline graphic, which are the multiplicities of the three-dimensional sample, two-dimensional sample and one-dimensional sample at Inline graphic respectively. Furthermore, let Inline graphic, Inline graphic, Inline graphic and Inline graphic. The likelihood function can be written as

Appendix.

so the loglikelihood can be expressed as Inline graphic where Inline graphic, Inline graphic, Inline graphic, Inline graphic. Let Inline graphic and Inline graphic be the Hessian of Inline graphic. For Inline graphic,

Appendix.

and since Inline graphic, Inline graphic and Inline graphic with Inline graphic, this quadratic form is strictly negative unless Inline graphic. Therefore Inline graphic is strictly concave. To show the concavity of Inline graphic, we consider the transformation Inline graphic, and denote Inline graphic the Hessian of Inline graphic. For Inline graphic,

Appendix.

by the Cauchy–Schwartz inequality. Since Inline graphic is strictly concave and Inline graphic is concave, Inline graphic is strictly concave.

References

  1. Anderssen R. S. & Jakeman A. J. (1974). Abel type integral equations in stereology. II. Computational methods of soluation and the random spheres approximation. J. Microsc. 105, 135–53. [Google Scholar]
  2. Antoniadis A., Fan J. & Gijbels I. (2001). A wavelet method for unfolding sphere size distributions. Can. J. Statist. 29, 251–68. [Google Scholar]
  3. Bartlett M. S. (1974). The statistical analysis of spatial pattern. Adv. Appl. Prob. 6, 336–58. [Google Scholar]
  4. Chiu S. N., Stoyan D., Kendall W. S. & Mecke J. (2013). Stochastic Geometry and its Applications. Chichester: John Wiley & Sons, 3rd ed. [Google Scholar]
  5. Cruz-Orive L. M. (1976). Particle size-shape distributions: the general spheroid problem. I. Mathematical model. J. Microsc. 107, 235–53. [DOI] [PubMed] [Google Scholar]
  6. Csiszàr I. & Tusnàdy G. (1984). Information geometry and alternating minimization procedures. In Statistics & Decisions, Suppl. 1, E. J. Dudewicz et al., eds. Munich: R. Oldenbourg Verlag, p. 205–37.
  7. Davidov O. & Iliopoulos G. (2010). A note on an iterative algorithm for nonparametric estimation in biased sampling models. Comp. Statist. Data Anal. 54, 620–4. [Google Scholar]
  8. Golubev G. K. & Levit B. Y. (1998). Asymptotically efficient estimation in the Wicksell problem. Ann. Statist. 26, 2407–19. [Google Scholar]
  9. Groeneboom P. & Jongbloed G. (1995). Isotonic estimation and rates of convergence in Wicksell's problem. Ann. Statist. 23, 1518–42. [Google Scholar]
  10. Hall P. & Smith R. L. (1988). The kernel method for unfolding sphere size distributions. J. Comp. Phys. 74, 409–21. [Google Scholar]
  11. Heinrich L. (2007). Limit distributions of some stereological estimators in Wicksell's corpuscle problem. Image Anal. Stereol. 26, 63–71. [Google Scholar]
  12. Jensen E. B. (1984). A design-based proof of Wicksell's integral equation. J. Microsc. 136, 345–8. [Google Scholar]
  13. Lindsay B. G. (1988). Composite likelihood methods. Contemp. Math. 80, 221–39. [Google Scholar]
  14. Little R. J. A. & Rubin D. B. (2002). Statistical Analysis with Missing Data. New York: John Wiley & Sons, 2nd ed. [Google Scholar]
  15. Mallows C. L. (1985). Discussion of “Empirical distributions in selection bias models” by Vardi. Ann. Statist. 13, 204–5. [Google Scholar]
  16. McGarrity K. S., Sietsma J & Jongbloed G. (2014). Nonparametric inference in a stereological model with oriented cylinders applied to dual phase steel. Ann. Appl. Statist. 8, 2538–66. [Google Scholar]
  17. Mecke J. & Stoyan D. (1980). Stereological problems for spherical particles. Math. Nachrichten 96, 311–7. [Google Scholar]
  18. Nychka D. W., Wahba G., Goldfarb S. & Pugh T. (1984). Cross-validated spline methods for the estimation of three-dimensional tumor size distributions from observations on two-dimensional cross sections. J. Am. Statist. Assoc. 79, 832–46. [Google Scholar]
  19. Silverman B. W., Jones M. C., Wilson J. D. & Nychka D. W. (1990). A smoothed EM approach to indirect estimation problems, with particular reference to stereology andemission tomography (with Discussion). J. R. Statist. Soc. B 52, 271–324. [Google Scholar]
  20. Tucker J. C., Chan L. H., Rohrer G. S., Groeber M. A. & Rollett A. D. (2012). Comparison of grain size distributions in a Ni-based superalloy in three and two dimensions using the Saltykov method. Scripta Materialia 66, 554–7. [Google Scholar]
  21. Van Es B. & Hoogendoorn A. (1990). Kernel estimation in Wicksell's corpuscle problem. Biometrika 77, 139–45. [Google Scholar]
  22. Vardi Y. (1985). Empirical distributions in selection bias models. Ann. Statist. 13, 178–203. [Google Scholar]
  23. Vardi Y. (1989). Multiplicative censoring, renewal processes, deconvolution and decreasing density: Nonparametric estimation. Biometrika 76, 751–61. [Google Scholar]
  24. Vardi Y. & Zhang C.-H. (1992). Large sample study of empirical distributions in a random-multiplicative censoring model. Ann. Statist. 20, 1022–39. [Google Scholar]
  25. Vardi Y., Shepp L.-A. & Kaufman L. (1985). A statistical model for positron emission tomography. J. Am. Statist. Assoc. 80, 8–20. [Google Scholar]
  26. Watson G. S. (1971). Estimating functionals of particle size distributions. Biometrika 58, 483–90. [Google Scholar]
  27. Wicksell S. D. (1925). The corpuscle problem. A mathematical study of a biometric problem. Biometrika 17, 84–99. [Google Scholar]
  28. Wicksell S. D. (1926). The corpuscle problem. Second memoir. Case of ellipsoidal corpuscles. Biometrika 18, 151–72. [Google Scholar]
  29. Wilson J. D. (1989). A smoothed EM algorithm for the solution of Wicksell's corpuscle problem. J. Statist. Comp. Simul. 31, 195–221. [Google Scholar]

Articles from Biometrika are provided here courtesy of Oxford University Press

RESOURCES