Abstract
The Cash statistic, also known as the statistic, is commonly used for the analysis of low-count Poisson data, including data with null counts for certain values of the independent variable. The use of this statistic is especially attractive for low-count data that cannot be combined, or re-binned, without loss of resolution. This paper presents a new maximum-likelihood solution for the best-fit parameters of a linear model using the Poisson-based Cash statistic. The solution presented in this paper provides a new and simple method to measure the best-fit parameters of a linear model for any Poisson-based data, including data with null counts. In particular, the method enforces the requirement that the best-fit linear model be non-negative throughout the support of the independent variable. The method is summarized in a simple algorithm to fit Poisson counting data of any size and counting rate with a linear model, by-passing entirely the use of the traditional statistic.
Keywords: Probability, statistics, maximum-likelihood methods, cash statistic, parameter estimation
2010 Mathematics Subject Classifications: 62F10, 62F30
1. Introduction
The maximum-likelihood modeling of integer-valued Poisson data can be accomplished with the use of the Cash, or statistic, first proposed by [8]. The Cash statistic applies to a variety of counting data in use across the sciences. One example is the counting of photons as a function of energy or wavelength, as commonly done by photon-counting detectors used in astronomy (e.g. [5]). Another example is the number or percentage of votes for a candidate in different precincts or polling stations. In counting experiments such as these, the collected data are in the form of independent integer-valued variables. The behavior of these variables as a function of an independent variable (such as photon energy or number of voters in a precinct) can be modeled with the aid of the Cash statistic, which is obtained from the logarithm of the likelihood of the data with a model for the distribution of the counts.
It is well established that the asymptotic distribution of the statistic, in the large-count limit, is a distribution (e.g. [8,13]). This limit is a result of the asymptotic convergence of a Poisson distribution with a Gaussian distribution of same mean and variance, which occurs for large values of the Poisson mean. It is straightforward to study the distribution of the statistic as a function of the parent mean μ, when the parent mean is specified a priori, e.g. when the fitting model has no free parameters. Such calculations are reported in [4,7,13], showing, among other results, that the expectation of C is significantly lower than the expectation of a distribution with the same number of degrees of freedom.
The use of the statistic for integer-valued, counting Poisson data is to be preferred to the use of the more common distribution. First, even in the large-count limit, use of the fit statistic leads to a bias in the best-fit parameters, due to the approximation of the Gaussian variance with the measured Poisson counts (e.g. [12]). Second, use of the statistic often requires the combination of datapoints, often referred to as binning of the independent variable, to reach a sufficient number of counts in each independent data point. Such binning may result in an undesirable reduction in the resolution of the data, especially in the presence of sharp features in narrow intervals of the independent variable, such as emission or absorption lines. The statistic, on the other hand, can be used on unbinned data that make use of the full resolution of the data.
Use of the statistic also comes with a number of challenges. First, it is not known exactly how free model parameters affect the distribution of – i.e. the statistic obtained when optimizing variable model parameters – especially in the low-count regime. A study of the distribution of for a simple one-parameter constant model was reported by Bonamente [4], although those results do not directly apply to more complex models such as the linear model.
Another challenge is the numerical complexity of the Poisson distribution and the associated statistic, which limits the ability to obtain analytical solutions for the best-fit parameters via the maximum-likelihood criterion. A key illustration of this challenge is that the simple linear model, which has an analytical solution for its best-fit model parameters and their covariance matrix when using the statistic (e.g. [2,3]), does not have an equally simple solution when using the statistic.
This paper addresses the latter problem by presenting a new semi-analytical method to identify the maximum-likelihood solution of the parameters of a linear model using the statistic. The method consists of the numerical solution of a simple analytical equation that determines the best-fit value of one of the two parameters, and the use of an analytical function to calculate the other parameter. It is also shown that not all Poisson data sets can be fit to an unconstrained linear model, since the resulting best-fit model may become negative, and therefore not usable for calculation of the Poisson-based statistic. In those cases, a simple generalization of the linear model is proposed that enforces the non-negative requirement for the best-fit model. Such a generalization ensures that data of all sizes and counting rates can be fit with a linear model, and that such model is unique. The results presented in this paper therefore ease the challenges presented by the numerical complexity of the Poisson distribution, by providing a simple semi-analytical method to find the best-fit parameters of the linear model, and making it possible to study the distribution of in the low-count regime.
This paper is structured as follows. Section 2 introduces the maximum-likelihood method and the Poisson-based statistic, Section 3 presents the equations for the model parameters and Section 4 the solution of those equations. Section 5 discusses conditions for the non-negativity of the best-fit linear model and Section 6 presents the extended linear model that ensures an acceptable model for all datasets. Finally, Section 7 contains a discussion and conclusions.
2. Methods for the maximum-likelihood analysis of Poisson data
2.1. Data model and the statistic
The data model considered in this paper is N independent integer-valued measurements , each Poisson-distributed and measured at a fixed value of the independent variable x. The data can also be viewed as originating from independent events that are sorted into N independent ‘bins’, each of size and centered at with counts. This is a common type of data for the physical sciences; for example, the independent variable x may be the wavelength of collected photons, and y the number of photons collected in a given wavelength range, binned according to the resolution of the instrument. The data can therefore be summarized as a collection of N independent two-dimensional variables
where is the unknown parent mean of the Poisson distribution of the counts , collected in a fixed range of the independent variable.
In general, the relationship between the dependent and independent variables is of the form , where is an analytical function with m adjustable parameters , . The likelihood of the data with the model is given by
where the adjustable parameters in are optimized so that the likelihood is maximized, for the given dataset. Instead of maximizing directly, it is convenient to minimize the function
| (1) |
where
is a quantity that is independent of the model, and
The statistic C is known as the Cash or statistic, originally proposed by Cash [8] and Baker and Cousins [1] to model and analyze X-ray observations of astronomical sources. For a model with m free parameters, the minimization of the statistic yields m equations, in general non-linear, that need to be solved to obtain the maximum-likelihood best-fit estimates of the m model parameters.
2.2. Linear models
The linear model is a simple and commonly used relationship between two variables. The customary parameterization of a linear relationship between two variables x and y is , with a and b as the two adjustable parameters. Another convenient and equivalent parameterization, suggested by Scargle et al. [17], is of the form
| (2) |
with λ and a as the two adjustable parameters, and a fixed fiducial value of the x variable. As will be shown, this parameterization is convenient when taking the derivative of the terms in Equation (1), since it leads to a separation between the two parameters λ and a. This is the parameterization used in this paper .1
The continuous function is related to the parent mean of each Poisson variable via an integral over the length of the i-th bin,
| (3) |
where is a step-wise function describing the Poisson mean for each bin, and the last equality applies because of the linearity of . It is therefore recognized that is a non-negative density function in units of counts per unit x (i.e. not just counts), while its integral over a range is the predicted Poisson mean in that bin, in units of counts. Therefore, the two functions and will vary from each other according to the size of the bins, which is allowed to be non-uniform. It is important to stress that the function must be non-negative, since it would not be meaningful to have a Poisson variable with a negative parent mean. The requirement places a number of constraints on the solutions of the maximum-likelihood equations that are discussed in Section 5.
2.3. Generalized linear models and other considerations for count data
Integer-valued count data of the type considered in this paper can be modeled with alternative statistics that afford more flexibility than the single-parameter Poisson distribution, in particular with regards to over- or under-dispersion of the data (e.g. [6,11,18,19]). Moreover, the regression with one or several independent variables may often require more complex models than a simple linear model. Generalized linear models and vector-generalized linear models provide a comprehensive and flexible framework for the regression of data, including a natural way to account for non-negative Poisson means via a suitable link function that relates the Poisson mean to the data and model parameters (e.g. [9,14,16,21,22]). Within the context of generalized linear models, a convenient link function would be in the form of , where the logarithm of the Poisson mean μ, instead of the mean itself, is modeled with a linear function (as in, e.g. [10]). Such log-linear models would ensure that the Poisson mean is always positive (e.g. see chapter 6 of [14]).
There are two main reasons for the present investigation of a simple linear regression using the standard Poisson distribution, instead of more versatile models or distributions. First is the goal to obtain an analytical, and therefore computationally efficient, solution for the best-fit parameters of the linear model for count data. Currently, the only available analytical solution for the maximum-likelihood fit of the linear model is for Gaussian-distributed variables, and this method is not accurate for low-count integer-valued data (e.g. [4]). Second, there may be scientific reasons to prefer a simple linear model versus, e.g. a log-linear model or other more complex models, particularly when the data or an underlying parent model suggest a direct linear relation between the dependent and independent variables (e.g. [15,20]). The combination of these practical and scientific reasons make it interesting to seek an analytical solution of the basic linear regression with Poisson data.
3. Maximum likelihood solutions for the parameters of the linear model
3.1. The statistic for the linear model
The linear model of Equation (2) is illustrated in Figure 1. To evaluate the statistic of Equation (1) with the linear model of Equation (2), start with
| (4) |
where is the range of the x variable, and it is assumed that the data covers the entire range R. There are situations where measurements of the dependent variable y are missing for certain intervals of the independent variable x, for example because measurements are not possible or because they are ignored in the analysis. An interval of the independent variable where data are missing, or are otherwise not used in the analysis, will be referred to as a gap. If the data contain gaps in the x variable, the limits of integration of Equation (4) will change. Section 6.3 describes the simple modification required to analyze data that contain such gaps.
Figure 1.
Linear model according to Equation (2). In this illustration, the functions , in units of counts per unit x, and , in units of total counts in the bin, follow one another closely because a bin size value of was used.
The second term of the statistic is simply
where M is the total number of counts, and the final term is
where Equation (3) was used. The statistic for the linear model of Equation (2) is therefore
| (5) |
where
is a model-independent term that does not have an effect in the subsequent minimization of the statistic. Note that the binning of the data is not required to be uniform, as will be illustrated in subsequent examples.
3.2. Equations for the maximum-likelihood solutions
Minimization of the statistic in Equation (5) is obtained via
which leads to
| (6) |
and
Substituting Equation (6) to eliminate λ leads to
and finally to
| (7) |
which is the equation to solve for the values of the a parameter. Equation (7) may be rearranged as
It is thus convenient to define
| (8) |
as the function whose zeros are solutions of Equation (7), with defined as
| (9) |
The problem of finding solutions for the parameter a has therefore been cast as finding the zeros of a function , which uses the function of Equation (9). One of the key properties to find the zeros of is that the zeros of are points of singularity for , as will be shown in detail in Section 4.
In summary, and Equation (6) are the two equations to solve to find the maximum-likelihood estimators a and λ of the linear model of Equation (2). The two equations are uncoupled, and therefore the burden is limited to finding a solution of .2 Then, Equation (6) is used to find . Notice that, in deriving Equations (6) and (7), no constraints were enforced to ensure that the best-fit model is non-negative, which is necessary for the applicability of the Poisson statistics and for the calculation . These constraints are presented in Section 5.
4. Analytical properties of the maximum-likelihood solution of the linear model
The maximum-likelihood estimate of the parameters λ and a are obtained by first finding the zeros of the function defined by Equation (8). For this purpose, it is necessary to establish a few analytical properties of the functions and . These properties will be used to study the location and properties of the zeros of the function . It is necessary to discuss explicitly the simple case of data with M = 1 count before presenting general results for . The case of M = 0 counts is not interesting, since it represents a dataset with no positive measurements throughout the range of the independent variable.
4.1. Case of M = 1
When M = 1, or just one event ( ) recorded in a bin centered at , one obtains
leading to
which is constant independent of a. The conclusion is that has no solutions when the data have only one count, and it is therefore not possible to find a maximum-likelihood solution for the linear model with M = 1. A simple interpretation for this finding is that it is not possible to constrain a two-parameter model with just one non-null data point. Further discussion is provided in Section 7.
4.2. General case of
It is possible to find certain properties of and that apply in general. The properties will lead to a general criterion to identify solutions of .
Property 4.1 Properties of the function —
The function , according to Equation (9), is the sum of terms of type
where n is the number of bins with , and represents the bins with non-null counts; when bins have no more than one count, then n = M. Therefore, the function has n points of singularity
that fall in the range . Near the points of singularity,
It is also immediate to see that for all points where the function is continuous. Therefore, the function decreases monotonically from to between two consecutive points of singularity. Moreover, the asymptotic limits are
As a result, the function has n−1 zeros, each between consecutive points of singularity, as illustrated in Example 4.2 and Figures 2 and 3. The zeros of are points of singularity for .
In particular, when M = 2 with and no counts in any of the other bins, then the zero of can be calculated analytically as
For the general case of M>2, the zeros of must be calculated numerically, as explained in the following.
Figure 2.
(Left:) Function for a representative data set with M = 2, with 100 data points , , and . (Right:) Function for the same data set.
Figure 3.
(Left:) Function for a representative data set with M = 3, with 100 data points , , and . (Right:) L Function for the same data set.
Property 4.2 Behavior of near the points of singularity —
Since is continuous with between its n points of singularity, immediately to the left of the singularity and immediately to the right. With any of the n−1 points of singularity of , this implies that
Property 4.3 Asymptotic limit of —
The asymptotic limit of at , defined as
can be evaluated via the De L'Hospital rule for the associated function , which is an indeterminate form of the type 0/0:
This property can be proven with a few steps algebra, by turning the sum over the N bins to an equivalent sum over the individual M counts. Therefore the asymptotic limit of becomes
(10)
Property 4.4 Sign of —
The derivative is calculated according to
(11) For ,
in which the sum is over all individual events, with some identical to each other if there are bins with more than one count. Using the Cauchy-Schwartz inequality
leads to
(12) Finally, using Equation (12) into Equation (11) leads to the conclusion that for all points of continuity of .
These properties of the function are also illustrated in Example 4.2 and Figures 2 and 3. The properties of and can be used to state a general criterion to locate all solutions of the equation .
Lemma 4.1 Location of zeros of —
The function has n−1 zeros, where is the number of bins with non-null counts. Of these, n−2 zeros are found between the n−1 points of singularity of also zeros of . The remaining zero is found either to the left of the smallest point of singularity, if the asymptotic limit is or to the right of the largest singularity, if .
Proof.
The result is a direct consequence of the presence of n points of singularity for (Property 4.1), the negative sign of between points of singularity (Property 4.4) and the asymptotic limit of at the points of singularity (Property 4.2).
Properties of and are illustrated in the following example, which examines the behavior of the two functions for two simple datasets with M = 2 and M = 3.
Example 4.2 Two datasets with M = 2 and M = 3 —
Two sample datasets with M = 2 and M = 3 are shown respectively in Figures 2 and 3. For the M = 2 dataset, with n = 2 bins with non-null counts, the function has just one point of discontinuity for , also the zeros of . This point of discontinutiy divides the domain of a into two intervals, with monotonically decreasing within these intervals, as shown in Figure 2. For the M = 3 dataset, with n = 3 bins with non-null counts, has n−1 = 2 points of discontinuity corresponding to the two zeros of , which were in turn found between the n = 3 points of singularity of , as shown in Figure 3.
The n−1 zeros of are all possible solutions of the maximum-likelihood method for the linear model. The following section discusses if these solutions are acceptable, in the sense that they produce a model that is non-negative throughout the domain of the x variable.
5. Acceptable solutions for the best-fit parameters of the linear model
In Section 4 it was shown that there are several possible solutions for the maximum-likelihood parameters of the linear model of Equation (2). In particular, data with M total counts, distributed over of the N available bins, have n−1 possible values of a that are a solution of the maximum-likelihood equation , with the corresponding value of λ provided by Equation (6). This section addresses the additional requirement that a model be non-negative in all bins, i.e. that a solution be acceptable, so that Poisson statistics apply and the statistic can be calculated.
Definition 5.1 Acceptable solution of —
A solution of the equation is said to be acceptable if it leads to a non-negative model throughout the support of the independent variable. Specifically, the function must satisfy the condition that , so that the parent mean of the Poisson distributions is always non-negative.
It will be shown in this section that there is at most one acceptable solution for any Poisson data set. Cases without an acceptable solution will be examined in Section 6, where a simple generalization of the linear model is provided that ensure one and only one acceptable solution for the fit of any data set to a linear model.
5.1. General conditions for acceptability
Given that the model is linear, the condition of acceptability is satisfied by simply requiring that the Poisson mean for the first and last bins, and , are both non-negative,
Notice how the model may still become negative in a portion of either the first or the last bin, but the linearity of the model simply requires that at the mid-point of the bin be non-negative, in order to ensure that the Poisson mean for the bin is non-negative.
Substituting Equation (6), the conditions become a function of a alone,
| (13) |
This equation can be used to find a range of the variable a that contains acceptable solutions. Therefore, a solution of is acceptable if and only if it satisfies Equation (13). This property leads to the following result regarding acceptable solutions:
Lemma 5.2 Necessary and sufficient condition for the acceptability of a solution of —
A solution of is acceptable if and only if it is found outside of the interval
Proof.
The conditions of Equation (13) can be used to find values of the variable a that are acceptable solutions of . For , i.e. when the denominators in Equation (13) are negative, the two conditions are satisfied when , since . Likewise, for , the two conditions are satisfied when . Therefore, acceptable solutions can be found in the range
(14)
Solutions with lead to a model that becomes negative in some of the bins, and therefore they are not acceptable. Figure 4 shows the function and illustrates the range of acceptable values for the parameters.
Figure 4.

(Left:) Parameter λ as a function of the parameter a, and range of acceptability of a. For values of the parameters result in a linear model that becomes negative in some of the bin, and therefore not acceptable for use with the Poisson distribution. The smallest and largest points of singularity of also correspond to the boundaries of this range, according to Property 4.1. The zeros of , also points of singularity for , are therefore inside this range. The x axis was plotted with the symlog option that allows a near-logarithmic scaling across a value of zero.
Example 5.3 M = 2 data with no acceptable solution —
The acceptability of the maximum-likelihood solution is illustrated with the data used for Figure 2. The only singularity of the function is
which is , and with a positive asymptotic value of , as shown in Figure 2. The solution is a = −0.077, which falls in the range of unacceptable solutions. In fact, the corresponding results in a best-fit model that is negative in some of the initial bins, e.g. . This best-fit model cannot be used to calculate the goodness-of-fit statistic, and therefore cannot be accepted as a maximum-likelihood solution.
5.2. General method to locate acceptable solutions
Lemma 5.4 Necessary condition for the acceptability of solutions —
Solutions of within points of singularity of are always unacceptable.
Proof.
This condition applies to data with M>2 and n>2 unique bins with non-zero counts, so that has points of singularity. In this case, n−2 of the n−1 solutions of are found between the n−1 points of singularity of , which are the zeros of the function . According to Property 4.1, the zeros of are located between the n points of singularity of , given by
(15) where is the coordinate of each of the unique bins where non-zero counts are recorded. According to Lemma 5.2, these n−2 solutions of fall in the interval on unacceptability.
Lemma 5.4 states that all zeros of that are within points of singularity may be discarded as unacceptable. The only possibility for an acceptable solution is the zero that is located outside of the range of the points of singularity, although such zero is not guaranteed to be acceptable. Accordingly, the following definition is made:
Definition 5.5 External solution of —
A solution of is said to be external if it falls outside of the range of the points of singularity of .
An external solution is therefore found either to the left of the first point of singularity of , if the asymptotic value , or to the right of the last, if . For M = 2, this is the only solution of , and the point of singularity of is calculated according to the equation provided at the end of Property 4.1.
Lemma 5.6 Necessary and sufficient conditions for the acceptability of an external solutions —
An external solution of is acceptable if and only if the following conditions are met, according to the sign of the asymptotic value :
(16)
Proof.
This property is simply based on the continuity of the function between points of singularity, and on Lemma 5.2, which established that acceptable values of the parameter a are outside of the interval .
(a) if , the solution of is to the left of the point of singularity. Given that , the solution will fall in the range of acceptability, viz., , if .
(b) Likewise, if , the solution is to the right of the point of singularity, and the solution is acceptable if .
The condition is also necessary. In fact, if Equation (16) is not satisfied, e.g. for , then the zero will be in the region of unacceptability.
This necessary and sufficient condition can be immediately applied to data that have non-zero counts, and thus points of singularity of , at the extremes of the range.
Corollary 5.7 Sufficient conditions for data with non-zero counts in first or last bin —
If a data set with satisfies either of the two conditions
(17) then the external solution of the equation is acceptable.
Proof.
According to Equation (8), at points of singularity for . A non-zero count in the first bin leads to a singularity of at , and therefore . Therefore, according to Lemma 5.6, if , there is an acceptable solution to the left of . Similar considerations are applicable to the case of a non-zero count in the last bin, where a singularity of occurs instead at , where . In this case, if , the external solution of to the right of the last singularity of is acceptable, again according to Lemma 5.6.
Notice how Corollary 5.7 does not ensure an acceptable solution simply if either the last or first bin have non-null counts. In fact, the presence of an acceptable solution is conditioned also on the sign of . For example, a data set with a non-null last bin but with a positive will not have an acceptable solution to the right of the last singularity. Finally, the two earlier lemmas can be used to state the uniqueness of the maximum-likelihood solution for the linear model.
Lemma 5.8 Uniqueness of an acceptable solution of —
If there is an acceptable solution of this solution is unique.
Proof.
This property is an immediate consequence of the fact that, of the n−1 solutions of , the n−2 solutions within points of singularity cannot be acceptable, as per Lemma 5.4. Moreover, the remaining solution may be acceptable, according to Lemma 5.6.
Example 5.9 Example of data with M = 5 and an acceptable solution —
Figure 5 shows the and functions for a dataset with M = 5 counts in 5 equally spaced bins ( ), and therefore n = 5. The function has n = 5 points of singularity and n−1 = 4 zeros, which correspond to the 4 points of singularity of . There are also 4 zeros of the function , of which n−2 = 3 correspond to unacceptable solutions. The asymptotic value if , and therefore the remaining external zero is to the right of the last singularity. At the end point of the range of acceptability, the function is , and therefore the last zero leads to an acceptable solution.
The data and all models are shown in Figure 6. Notice how the model corresponding to the solutions of that are not acceptable lead to a model that becomes negative; these models cannot be used for the statistic, and need to be rejected. The only acceptable model is shown as a solid line, and the corresponding values of the statistic for each bin are shown in the right panel.
Figure 5.
Functions and for the dataset presented in Example 5.9.
Figure 6.
(Left) Best-fit linear models for the M = 5 data presented in Example 5.9. There are 4 solutions of , of which the first three lead to models that become negative somewhere in the x range; the acceptable model corresponds to the largest solution. (Right) Contributions to statistic for each of the N = 200 bins, for a total of .
The results presented in this section can be summarized by a simple algorithm that can be used to determine whether there is an acceptable solution of the equation , and to calculate it, when it exists.
Remark 5.1 Algorithm to determine acceptable best-fit parameters of Equation (2) —
Consider a dataset with N bins, a range R of the independent variable between and , a total number of integer-valued counts M with a number bins with non-null counts. The existence and value of the best-fit parameters for the linear model of Equation (2) can be determined according to the following steps:
If , there is no acceptable solution.
Calculate the points of singularity for , given analytically by Equation (15).
Numerically calculate the n−1 zeros of between points of singularity. These zeros are points of singularity for .
Calculate the asymptotic value , given analytically by Equation (10).
(Optional) Numerically calculate the n−2 zeros of , found between points of singularity of (also zeros of . These zeros always lead to unacceptable solutions.
Numerically calculate the remaining external zero of , either to the left of the first point of singularity (if ), or to the right of the last singularity (if ).
Determine the acceptability of the external solution. Two cases are possible:The numerical solution of both equations and are facilitated by the continuity of the two functions between the known points of singularity, or between the last point of singularity and . An efficient and accurate numerical routine is provided, for example, by python's root_scalar, with the brentq method. The method requires the specification of an interval, or bracket, where the solution is sought. This is either an interval between the two adjacent points of singularity, or an open interval either below the first singularity, or above the last singularity. For example, a zero of can be sought in the range , where and are two consecutive points of singularity of . This bracket requires a small value ε, to be determined according to the separation of the data points, to ensure that the function at the two extremes of the bracket has opposite signs.
5.3. Asymptotic data requirements for acceptable solutions
This section examines when data sets with a large number of counts have an acceptable solution. It will be shown that, when the counts are distributed uniformly across the support, data with large M will always have an acceptable solution. In general, however, it is possible to find datasets with large M that do not have an acceptable solution, depending on the distribution of counts. This observation will lead to a generalization of the simple model of Equation (2), presented in Section 6. First, it is necessary to investigate how the asymptotic value of is affected by the distribution of the detected counts.
Property 5.1 Properties of —
The asymptotic value of is given by Equation (10), and it is negative if
(18) and positive otherwise. Given that , each term has a value if is above the midpoint of the range, and a value if is below the midpoint. The left-hand side of Equation (18) is the sample mean of the variable .
It can now be established that, for data with a large number of bins and a uniform distribution of the counts, the asymptotic value of is negative. Moreover, when the number of counts M is also large, the external solution to the right of the last singularity will be acceptable.
Lemma 5.10
For a large number of bins N and a uniform distribution of counts,
where is the expectation based on a parent distribution for the position of the i-th count. Moreover, when M is large, the asymptotic value of is negative.
Proof.
Assuming bins of uniform width, the range is . The distance of the i-th count from the initial point of the range is , and it can be written as
where . When N is large and the counts are uniformly distributed in the N bins, it is possible to treat f as a continuous and uniformly distributed random variable in the range , thus with unit probability distribution function. Accordingly, the expectation of the inverse of the distance can be approximated as
Therefore the expectation is asymptotically larger than 2/R for large N. Moreover, for a large number of counts M, the law of large numbers ensures that the sample average of tends to its expectation. Therefore, as M increases, the asymptotic value of tends to be negative, and the external solution of will be found to the right of the last point of singularity for .
It is now possible to state a sufficient condition that applies to uniformly distributed counts in the large-count regime.
Lemma 5.11 Sufficient condition for an acceptable solution —
For large M and N with uniformly distributed counts and a non-null count in the last bin, the external solution of is acceptable and it is found to the right of the last singularity.
Proof.
For data with non-null counts in the last bin, i.e. , the last singularity of occurs at . At points of singularity for , , according to Equation (8), and therefore . Notice that the point marks the boundary of the region of acceptability for the solutions of , according to Lemma 5.2.
The last singularity for , also a zero of , will thus occur at a point which is to the left of , and the continuity of to the right of the last singularity ensures that remains positive between and . Also, Lemma 5.10 ensures that the asymptotic value of is negative. Therefore there is a zero of to the right of , and this external zero is acceptable, according to Lemma 5.2.
These asymptotic results apply to a uniform distribution of counts, which is a very restrictive condition. When the distribution of counts is not uniform, even large-M datasets may not have an acceptable model. It goes beyond the scope of this paper to seek additional sufficient conditions for convergence, given the number of variables at play (in particular, the number of counts M, the number of bins N and their size and location, and the distribution of counts), and the fact that necessary and sufficient conditions for convergence have been provided earlier in this section. Instead, selected numerical simulations are presented to quantify the fraction of Poisson datasets that do not have a non-negative best-fit linear and to illustrate a few representative cases.
For this purpose, 100 data sets were simulated for various values of the total number of counts M, initially assuming that the M counts were uniformly distributed among N = 100 equally spaced bins, following the same pattern of bins along the x axis as in Figures 2 and 3. As expected, based on the asymptotic results of this section, for , all datasets have an acceptable model (Figure 7, red curve). Then, the same simulations were repeated for a distribution of counts that is either linearly increasing or decreasing towards larger values of x, i.e. with samples drawn respectively from the probability distributions functions
with .3 For these cases, the simulations show that the number of acceptable models remains smaller even for large values of M. The right panel of Figure 7 also illustrates the fraction of data with a negative . As expected according to Lemma 5.10, uniformly distributed data (red curve) have a negative asymptotic for large M; moreover, the same applies for data distributed with a negative slope (blue curve). This is explained according to Property 5.1, since data points below the mid-point of the range drive the average of to values greater than 2/R, and that in turn causes to be negative. For data with a positive slope, even for large M there is a large fraction of data with a positive asymptotic limit of . These simulations can be used as examples of large-M data that do not have an acceptable solution using the linear model of Equation (2).
Figure 7.
(Left:) Fraction of datasets with available best-fit non-negative linear model, as function of the number of counts M. (Right:) Fraction of datasets with negative asymptotic value .
6. An extended linear model with a non-negative solution
The paper has identified cases where the maximum-likelihood equations do not yield an acceptable solution for the parameters of the linear model. In particular, this is true for all data with only one count (M = 1) and null counts in N−1 of the N available bins. This can simply be viewed as the inability to constrain two free parameters with just one non-zero data point. In such case, it may be sufficient to model the data with a simple constant model, with a best-fit model equal to the sample average of the counts in all the bins (see, e.g. [3,4]). There are also other data sets with counts that do not have an acceptable, non-negative model. One such example was shown in Figure 2, for a dataset with M = 2. Section 5 also illustrated data with large M that do not have an acceptable solution (see, e.g. Figure 7).
Motivated by the need to have a linear model that is applicable to any situation, this section proposes a simple generalization of the linear model of Equation (2) that ensures an acceptable maximum-likelihood solution using the statistic for any Poisson dataset.
Definition 6.1 The extended non-negative linear model —
The proposed non-negative linear model is given by:
the standard linear model of Equation (2), when such model has an acceptable solution; otherwise,
the model is parameterized as one of the following three functions:
A one-parameter linear model pivoted to zero at the initial point :for which , and with a positive adjustable parameter .
(19) A one-parameter linear model pivoted to zero at the final point :for which , with an adjustable parameter and therefore a negative slope.
(20) A one-parameter constant model:
(21)
It will be shown that the three models of Equations (19), (20) and (21) have simple analytical solutions for their maximum-likelihood best-fit parameters (respectively , and ), and therefore it is always possible to use one of these models as an acceptable linear model for any dataset.
6.1. Maximum-likelihood solutions for the pivoted and constant linear models
For the linear model pivoted at , Equation (19) is used to evaluate the statistic, Equation (1). Assuming that the data covers the range R continuously, as also assumed for Equation (4), the term
| (22) |
leads to
| (23) |
where
| (24) |
is a term that is independent of the model, and therefore plays no role in the minimization of the statistic. The best-fit parameter is given by , leading to the simple analytical solution
| (25) |
For the linear model pivoted at , use of Equation (20) into Equation (1) leads to
| (26) |
and
| (27) |
with
| (28) |
The best-fit parameter is therefore given by
| (29) |
Finally, the best-fit constant model has a statistic of
| (30) |
with
| (31) |
This leads to a best-fit parameter
| (32) |
which is equivalent to the sample average of the data when multiplied by a uniform , as found in [3]. As already remarked after Equation (4), the equations developed in this section apply to data that cover continuously the range to . Data with gaps in the x variable require a simple modification to these equations that is presented in Section 6.3.
6.2. Use of the extended non-negative linear model
Equation (2) in combination with the extensions provided by Equations (19), (20), and (21) are to be used according to the following method, which defines the solution of the extended model.
Definition 6.2 Solution of the extended non-negative linear model —
Solution of the extended non-negative linear model is given by:
the solution with the standard linear model of Equation (2), if that solution is acceptable. As shown in Section 5 and specifically Lemma 5.8, this solution is guaranteed to be unique, when it exists.
If a solution with the standard linear model is not available, the solution is given by the best-fit model that gives the lowest value of the statistic, among the three options provided by Equations (19), (20) and (21).
Lemma 6.3 Existence and uniqueness of solution for the extended non-negative linear model —
There exists one and only one maximum-likelihood solution for the extended non-negative linear model fit to any Poisson-distributed data.
Proof.
The proof is a direct consequence of the fact that there is at most one non-negative solution for the model of Equation (2) (Lemma 5.8), and of Definition 6.2 for the solution of the extended model.
Remark 6.1 Expanded algorithm for the extended non-negative linear model —
The algorithm presented in Remark 5.1 can be extended to the non-negative linear model. When the linear model of Equation (2) fails to produce an acceptable solution, the following two additional steps must be added:
- (8)
Calculate the three additional best-fit linear models (pivoted at A, pivoted at B and constant) and their statistic, using the analytical formulas (19), (20) and (21).
- (9)
Accept as the best-fit model the one with the lowest statistic. Notice that if the original linear model of Equation (2) is acceptable, its value of the statistic will be lower than that of the other three linear models.
The use of the extended non-negative linear model is illustrated in the two following examples.
Example 6.4 Use of the extended non-negative model for data with no acceptable standard linear model —
In the left panel of Figure 8 are shown the results for the same M = 2 data of Figure 2, for which a non-negative linear model according to Equation (2) could not be found. The data can be fit with the pivoted and constant linear models, which yield best-fit statistic values of , and . The values of the statistic indicate that the linear model pivoted at is the most accurate representation of these data, and should be regarded as the best-fit linear model.
Figure 8.
Pivoted and constant linear models for the data of Figure 2 (with M = 2, left) and Figure 3 (M = 3, right).
Example 6.5 Use of the extended non-negative model for data with an acceptable standard linear model —
The right panel of Figure 8 shows the results for the M = 3 model of Figure 3, for which a best-fit non-negative model with the ‘standard’ linear model was in fact available, for a statistic value of C = 20.996. The pivoted and constant linear models yield values of , and , all larger than the value for the best-fit standard linear model. This analysis confirms that the ‘standard’ linear model, when available, is indeed the most accurate linear representation of the data.
It is in principle possible to devise a linear model different from those of Equations (19), (20) and (21), that may yield a lower value of the statistic. There are in fact infinitely many such models, e.g. by fixing an arbitrary intercept of the axis. The choices made by the three simple extensions discussed in this paper are intended to provide simple alternatives to the full linear model that have a simple interpretation and likewise simple analytical solutions.
6.3. Binning and gaps in data
The methods of analysis presented in this paper can be applied to data with any binning, including data with non-uniform bin sizes. The bin sizes, however, will have an effect on the best-fit model, as can be seen by the fact that the function is a function of , there is the center coordinate of the j-th bin. When Poisson data are collected on an event-by-event basis, the choice of bin size must be made based on considerations on the methods of collection of the data and the instruments used for the collection.
In Equations (4), (22) and (26) it was assumed that the range of integration of the x variable was continuous, therefore implying that the data covers the to range without any gaps or missing data. It is possible to provide a simple generalization to those equations to include gaps in the data. This is in fact a situation of practical importance, since certain regions of the independent variable may be without data for a variety of reasons. A common situation is the exclusion of portions of the x variable because of poor calibration of the instrument (e.g. the exclusion of a wavelength range because of detector inefficiencies), or because an instrument was not operating during certain time intervals. In these cases, one cannot just assign a value of zero counts to that range of the independent variable, but rather the intervals must be explicitly removed from the data, therefore creating gaps in an otherwise continuous variable.
Definition 6.6 Gaps in the data —
A gap in the data is defined as a continuous interval of the independent variable between and , of length , that is not covered by any of the bins. A Poisson data set may have g non-overlapping gaps between and , , with the mid-point of each gap and . The length of all gaps in the independent variable x is .
The following lemmas summarize the changes that need to be made to analyze data that contain gaps in the independent variable
Lemma 6.7 Modifications to the statistic and to the functions and for gaps in the data —
When the data have gaps, the statistic becomes
(33) Moreover, the function whose zero provides the best-fit value of a becomes
(34) where R is replaced by a modified given by
(35) and the best-fit solution for the parameter λ is
(36)
Proof.
The modification to the statistic to account for the presence of gaps is provided by changing Equation (4) to
(37) The use of Equation (37) in place of Equation (4) leads to the statistic of Equation (33) in place of the original Equation (5). Taking the derivatives of C with respect to a and λ and setting them to zero leads to
and
Notice that
where is the combined length of all ( non-overlapping) gaps. Defining
(38) leads to
thus proving Equation (36), and
where is the usual function as defined in Equation (9). Simple algebraic modifications and elimination of λ lead to
where
thus proving Equation (34).
Lemma 6.7 shows that, when there are gaps in the independent variable, the method of analysis to find a solution for a and λ proceeds in the same way as when there are no gaps, provided the function uses the parameter in place of R. Once the best-fit value of a is found, λ can be calculated analytically by making a change in the denominator of the function to account for the gap , according to Equation (36).
Lemma 6.8 Modifications to the statistic and to the best-fit parameters of the pivoted and constant models for gaps in the data —
When the data have gaps, the statistic for the pivoted and constant models become
(39) with
(40) The best-fit model parameters become
(41)
Proof.
For the model pivoted at A, Equation (22) is modified by the presence of gaps as
(42) Defining
(43) and noticing that
leads to
Since and , it follows that
Then, taking a derivative of with respect to and setting it to zero completes the proof for the model pivoted at A.
For the model pivoted at B,
(44) where
From this, the equations for and follow after a few simple algebra steps.
The results for the constant model follow immediately from the constancy of the function .
Lemma 6.8 shows that the pivoted and constant models retain a simple analytical solution even in the presence of gaps in the data. An application of the fit to Poisson data with non-uniform bin sizes and with a gap in the data is provided in the following example.
Example 6.9 Data with non-uniform bin sizes and a gap in the data —
The data chosen for this example span a range of the independent variable between and , with a gap between and . All nine measurements have a value of , with bin sizes of for the first three data points, and for the other six data points, as shown in Figure 9. The data have an acceptable solution for the standard linear model (in black) with a = 0.188, , for a best-fit statistic of . Given the non-uniform bin sizes, the best-fit density function (black continuous line, in units of counts-per-bin-size) differs from the best-fit model (black step-wise curve, in units of counts or counts-per-bin). The constant model (yellow) has a best-fit parameter of , according to Equation (41) with M = 9, R = 9 and , with , the linear model pivoted at A has and , and the linear model pivoted at B has , and .
Figure 9.
Best-fit linear models for data with non-uniform bins and with a gap in the data. The dot-dashed curve are the density functions and the solid step-wise curves are the models for the integer data.
In summary, there are no significant additional complications for the analysis of data that contain a number of gaps or missing data. The following algorithm summarizes the changes required to analyze data with gaps.
Remark 6.2 Algorithm to implement changes in the analysis when gaps in the data are present —
This algorithm details the additions and modifications required for Algorithms 5.1 and 6.1 when there are data gaps present, following the same enumeration.
(Additional step) Calculate the location , and range of each gap, the total gap length , and according to Equation (35).
Hereafter replace R with in the definition of .
Use Equation (36) instead of (6) to calculate the value corresponding to an acceptable solution a.
For the calculation of the C statistics and best-fit parameters of the constant and pivoted models, use respectively Equation (39) (instead of Equations (23), (27) and (30)) and Equation (41) (instead of Equations (25), (29) and (32)).
6.4. A note on the distribution of the statistic
It is well known that, in the large-count limit, the statistic – i.e. the statistic evaluated for the best-fit linear model – is expected to be distributed like a distribution with N−2 degrees of freedom, where N is the number of bins and 2 is the number of adjustable free parameters of the linear model (e.g. [3,8]). Moreover, properties of the statistic for a fixed model with no free parameters is also known accurately for any value of the parent Poisson mean [4,13]. What remains to be analyzed in further detail is the effect of free parameters on the distribution of , in the low-count regime. The purpose of this paper is to present a method to evaluate the best-fit parameters of the linear model, precisely with the intent to further study the distribution of via numerical simulations that rely on this method of analysis.
For a significant number of data sets, and especially for data with a small number of counts, the only non-negative linear model is one of the three extensions – all of them with just one adjustable parameter, instead of two of the traditional linear model. This requirement that the model be non-negative was introduced by the use of the Poisson distribution, and did not enter the discussion of Gaussian-distributed datasets that can be fit with the distribution. It is likely that such new requirement will result in differences between the distributions of and for the linear model in the low-count regime, with implications for hypothesis testing and confidence intervals on the best-fit parameters. The distribution of the for the linear model will be presented in a separate paper.
7. Discussion and conclusions
This paper has presented a new semi-analytical method to find the best-fit parameters of a linear model for the fit to integer-valued counting data, using the Poisson-based statistic. The method consists first of finding a solution for the non-linear equation , where a is one of the two parameters of the model. The other parameter λ is then calculated analytically via a simple analytical function . The two parameters a and λ must be such that the linear model is non-negative in each bin, in order to ensure the applicability of the Poisson distribution. The analysis presented in this paper shows that such requirement leads, in fact, to the uniqueness of the best-fit model, when such solution is available. This is clearly a very desirable property of the method, and a necessary condition for the use of this method to analyze Poisson-distributed data.
This paper has identified cases where low-count Poisson data do not have a suitable non-negative best-fit linear model according to the standard parameterization of Equation (2). For this reason, an extended linear model was proposed that guarantees a unique non-negative solution for any Poisson data set. This is accomplished by pivoting the linear model to either end of the range of the independent variable or by using a simple constant linear model, when the traditional linear model leads to an unsuitable solution. Thanks to simple analytical solutions for the best-fit parameter of these extensions, the use of the extended non-negative linear model remains straightforward.
The availability of a simple method to identify the best-fit parameters of a linear model for Poisson data of any number of counts makes it possible to further our understanding of the statistic. In particular, it is now possible to study the distribution of the statistic for one of the most commonly used models with adjustable parameters, i.e. the linear model, especially in the low-count regime where its distribution is not known exactly.
Acknowledgements
The author gratefully acknowledges the support of NASA Chandra grant AR6-17018X, to support the development of the Cash statistic.
Funding Statement
The author gratefully acknowledges the support of NASA Chandra [grant number AR6-17018X].
Notes
A maximum-likelihood solution for the standard form of the linear model with the statistic is reported in [3]. It leads to a set of two non-linear coupled equations, whose numerical solution can be challenging.
It is useful to point out that a log-linear model, as obtained for example using a logarithmic link function within the context of generalized linear models (see Section 2.3), would have led to coupled equations involving the exponential of the parameters, in place of Equations (6) and (7).
Random samples from these distributions are readily obtained by simulating the associated normalized linear variables in (with distributions of and 2−2y, respectively for an increasing and decreasing distribution). Samples of x are then obtained by rescaling samples of y to the range via a linear transformation with . Simulations of the normalized distributions for y are easily accomplished with the aid of a uniform variable u in , which is commonly available in most software packages. With the aid of the quantile function , where F is the cumulative distribution of y (respectively and for the two linear models), the variable y is simulated as (see, e.g. Section 4.8 of [3]). This means that random samples of the normalized increasing and decreasing distributions are obtained respectively via and , where u are samples from a uniform distribution in .
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- 1.Baker S. and Cousins R.D., Clarification of the use of CHI-square and likelihood functions in fits to histograms, Nucl. Instrum. Methods Phys. Res. 221 (1984), pp. 437–442. doi: 10.1016/0167-5087(84)90016-4 [DOI] [Google Scholar]
- 2.Bevington P.R. and Robinson D.K., Data Reduction and Error Analysis for the Physical Sciences, 3rd ed., New York: McGraw Hill, 2003. [Google Scholar]
- 3.Bonamente M., Statistics and Analysis of Scientific Data, 2nd ed., Graduate Texts in Physics, Springer, 2017. [Google Scholar]
- 4.Bonamente M., Distribution of the C statistic with applications to the sample mean of poisson data, J. Appl. Stat. 47 (2019), pp. 1–22. doi: 10.1080/02664763.2019.1704703. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Bonamente M., Probability models of chance fluctuations in spectra of astronomical sources with applications to X-ray absorption lines, J. Appl. Stat. 46 (2019), pp. 1129–1154. doi: 10.1080/02664763.2018.1531976. [DOI] [Google Scholar]
- 6.Bonat W.H., Jorgensen B., Kohonendji C.C., Hinde J., and Demetrio C., Extended poisson-tweedie: properties and regression models for count data, Stat. Model. 18 (2018), pp. 24–49. doi: 10.1177/1471082X17715718 [DOI] [Google Scholar]
- 7.Cash W., Generation of confidence intervals for model parameters in X-ray astronomy, Astron. Astrophys. 52 (1976), p. 307. [Google Scholar]
- 8.Cash W., Parameter estimation in astronomy through application of the likelihood ratio, Astrophys. J. 228 (1979), p. 939. Available at http://adsabs.harvard.edu/cgi-bin/nph-bib_query?bibcode=1979ApJ...228..939C&db_key=AST. doi: 10.1086/156922 [DOI] [Google Scholar]
- 9.Dobson A. and Barnett A., An Introduction to Generalized Linear Models, 4th ed., Boca Raton: CRC Press, 2018. [Google Scholar]
- 10.El-Sayyad G.M., Bayesian and classical analysis of poisson regression, J. R. Stat. Soc. Ser. B (Methodological) 35 (1973), pp. 445–451. Available at http://www.jstor.org/stable/2985109. [Google Scholar]
- 11.Haselimashhadi H., Vinciotti V., and Yu K., A novel Bayesian regression model for counts with an application to health data, J. Appl. Stat. 45 (2018), pp. 1085–1105. doi: 10.1080/02664763.2017.1342782. [DOI] [Google Scholar]
- 12.Humphrey P.J., Liu W., and Buote D.A., and Poissonian data: biases even in the high-count regime and how to avoid them, Astrophys. J. 693 (2009), pp. 822–829. doi: 10.1088/0004-637X/693/1/822 [DOI] [Google Scholar]
- 13.Kaastra J.S., On the use of C-stat in testing models for X-ray spectra, Astron. Astrophys. 605 (2017), p. A51. doi: 10.1051/0004-6361/201629319 [DOI] [Google Scholar]
- 14.McCullagh P. and Nelder J., Generalized Linear Models, 2nd ed., London: Chapman & Hall/CRC, 1989. [Google Scholar]
- 15.Mock D.M., Matthews N.I., Zhu S., Strauss R.G., Schmidt R.L., Nalbant D., Cress G.A., and Widness J.A., Red blood cell (RBC) survival determined in humans using RBCs labeled at multiple biotin densities, Transfusion 51 (2011), pp. 1047–1057. doi: 10.1111/j.1537-2995.2010.02926.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Nelder J.A. and Wedderburn R.W.M., Generalized linear models, J. R. Statist. Soc. Ser. A (General) 135 (1972), pp. 370–384. Available at http://www.jstor.org/stable/2344614. doi: 10.2307/2344614 [DOI] [Google Scholar]
- 17.Scargle J.D., Norris J.P., Jackson B., and Chiang J., Studies in astronomical time series analysis. vi. Bayesian block representations, Astrophys. J. 764 (2013), p. 167. doi: 10.1088/0004-637X/764/2/167 [DOI] [Google Scholar]
- 18.Sellers K.F. and Shmueli G., A flexible regression model for count data, Ann. Appl. Stat. 4 (2010), pp. 943–961. Available at http://www.jstor.org/stable/29765537. doi: 10.1214/09-AOAS306 [DOI] [Google Scholar]
- 19.Shmueli G., Minka T.P., Kadane J.B., Borle S., and Boatwright P., A useful distribution for fitting discrete data: revival of the Conway-Maxwell-Poisson distribution, J. R. Statist. Soc. Ser. C (Appl. Statist.) 54 (2005), pp. 127–142. Available at http://www.jstor.org/stable/3592603. doi: 10.1111/j.1467-9876.2005.00474.x [DOI] [Google Scholar]
- 20.Valenti S., Howell D.A., Stritzinger M.D., Graham M.L., Hosseinzadeh G., Arcavi I., Bildsten L., Jerkstrand A., McCully C., Pastorello A., Piro A.L., Sand D., Smartt S.J., Terreran G., Baltay C., Benetti S., Brown P., Filippenko A.V., Fraser M., Rabinowitz D., Sullivan M., and Yuan F., The diversity of type II supernova versus the similarity in their progenitors, Mon. Not. R. Astron. Soc. 459 (2016), pp. 3939–3962. doi: 10.1093/mnras/stw870 [DOI] [Google Scholar]
- 21.Yee T., Vector Generalized Linear and Additive Models, New York: Springer, 2015. [Google Scholar]
- 22.Yee T.W. and Wild C.J., Vector generalized additive models, J. R. Stat. Soc. Ser. B (Methodological) 58 (1996), pp. 481–493. Available at http://www.jstor.org/stable/2345888. [Google Scholar]








