Abstract
Motivated by a problem encountered in the analysis of cell cycle gene expression data, this article deals with the estimation of parameters subject to order restrictions on a unit circle. A normal eukaryotic cell cycle has four major phases during cell division, and a cell cycle gene has its peak expression (phase angle) during the phase that may correspond to its biological function. Because the phases are ordered along a circle, the phase angles of cell cycle genes are ordered unknown parameters on a unit circle. The problem of interest is to estimate the phase angles using the information regarding the order among them. We address this problem by developing a circular version of the well-known isotonic regression for Euclidean data. Because of the underlying geometry, the standard pool adjacent violator algorithm (PAVA) cannot be used for deriving the circular isotonic regression estimator (CIRE). However, PAVA can be modified to obtain a computationally efficient algorithm for deriving the CIRE. We illustrate the CIRE by estimating the phase angles of some of well-known cell cycle genes using the unrestricted estimators obtained in the literature.
Keywords: Circular isotonic regression, Order restrictions on circle, Phase angles, Pool adjacent violator algorithm, Restricted maximum likelihood estimators, von Mises distribution
1. INTRODUCTION
In some applications, researchers are interested in estimating angular parameters when they are intrinsically ordered around a unit circle. For instance, some biologists are interested in determining the phases of cell cycle genes (e.g., Whitfield et al. 2002), which are functionally ordered around a circle. Similarly, some are interested in determining phase regulation of circadian genes (Akashi et al. 2006; Nakamura et al. 2008), or a gynecologist may be interested in studying the effect of stress on the duration of follicular phase in premenopausal women (Xiao et al. 1998). Problems such as these can be formulated as a problem of estimating a vector φ = (φ1, φ2, …, φq)′ of angular parameters for which the components are ordered around a unit circle. There exists more than 50 years of literature for the case when φ ∈ Rq, the Euclidean space, and its components are constrained by inequalities (see Robertson, Wright, and Dykstra 1998; Silvapulle and Sen 2005; Van Eeden, 2006). However, no literature exists for the case when the components of φ are ordered points on a unit circle—the focus of this article. As demonstrated here, the existing methodology for Rq cannot be trivially extended to the current problem.
This article is motivated by a problem encountered in the analysis of cell cycle gene expression data. A normal eukaryotic cell goes through four phases—namely, G1, S, G2, and M phases during cell division (Fig. 1). Each phase has a well-defined function and, within each phase, a collection of genes (called “cell cycle genes”) plays an important role. For instance, some genes are associated with DNA replication during the S phase, whereas some are with mitosis during the M phase. Cell biologists are interested in determining the phase associated with a cell cycle gene (Whitfield et al. 2002) that corresponds to its peak expression during the cell cycle.
Figure 1.

Phases of a normal cell division cycle. The dark lines describe the boundaries of the phases and the arrows correspond to the angle of peak expression of genes, Gene 1 and Gene 2. The outside arrow points in the direction of the cell cycle.
Typically at the start of the experiment all cells are synchronized by arresting them at a specific phase of cell cycle. This establishes the pole of the circle. Cells are then released to measure messenger RNA (mRNA) gene expressions at various times using microarrays as the cells go through the cell cycle. The phase angle φ of a gene is the angle corresponding to its peak expression relative to the pole. As a result of their biological functions, the phase angles of cell cycle genes are expected to be ordered on a circle. For instance, the cyclins CCNE1, CCNA2, and CCNB1 have peak mRNA expression in the G1/S, G2, and G2/M phases, respectively (Pines and Hunter 1989, 1990; Whitfield et al. 2002). The DNA metabolism gene RRM2 peaks during the early S phase (Bjorklund et al. 1990), whereas the gene VEGFC peaks during the M/G1 phase (Datta and Sokhansanj 2007). Hence, if cells are arrested in the G1/S boundary, then we may expect the simple order 0 ≤ φCCNE1 ≤ φRRM2 ≤ φCCNA2 ≤ φCCNB1 ≤ φVEGFC ≤ 2π.
Biologically, the order among cell cycle genes should be rotation invariant and, hence, invariant of the location of the pole. Components of φ are said to follow a rotation-invariant simple order around a unit circle, denoted by φ1 ≺ φ2 ≺ … φp ≺ φ1, if for all i = 1, …, p, φi “follows” φi − 1 and is “followed by” φi+1 (in the counter clockwise direction), where φ0 ≡ φp and φp + 1 ≡ φ1. This rotation-invariant simple order may be expressed as the union of simple orders {0 ≤ φ1 ≤ φ2 ≤ … ≤ φp ≤ 2π}, {0 ≤ φ2 ≤ φ3 ≤ … ≤ φp ≤ φ1 ≤ 2π}, …, {0 ≤ φp ≤ φ1 ≤ φ2 … ≤ φp−1 ≤ 2π}. The estimator satisfying the rotation-invariant simple order can be obtained by deriving the estimator under each of the individual simple order restrictions and by selecting the one that has the smallest value of the objective function in the minimization problem defined in Section 2. Thus, it is enough to develop the methodology for obtaining the order-restricted estimators under {0 ≤ φ1 ≤ φ2 ≤ …≤ φp ≤ 2π}, which is the focus of this article. The methodology can then be applied to each of the other subsets that make up the order restriction φ1 ≺ φ2 ≺ … φp ≺ φ1 to obtain the rotation-invariant simple order estimate.
Liu et al. (2004) developed a nonlinear regression model called the “random periods model” for estimating the phase angle of a cell cycle gene (see Supplementary Materials for details). Because they do not impose the order restriction among the phase angles, their unrestricted estimates φ̂=(φ̂1, φ̂2, …, φ̂q)′ may not necessarily follow the desired order restriction.
Analogous to Rq, the objective of this article is to construct a “circular” isotonic regression estimator of φ using the unrestricted estimator φ̂ by solving a suitable minimization problem described in Section 2. Recall that similar to normal distribution for Rq data, the von Mises distribution is commonly used for describing circular data (Mardia and Jupp 2000). For independently and normally distributed data, the isotonic regression estimator yields the restricted maximum likelihood estimator (RMLE) of the population means. Similarly we observe that the circular isotonic regression estimator (CIRE), introduced in Section 2, is the RMLE of φ, if the components of φ̂ are independently distributed according to von Mises distribution. We demonstrate that the standard pool adjacent violators algorithm (PAVA) (Robertson, Wright, and Dykstra 1988), used for deriving the isotonic regression estimator when φ ∈ Rq, is not suitable in the current context. In that section we also discuss the existence and uniqueness of the CIRE. An algorithm for computing the CIRE is described in Section 3. It is based on PAVA, but is tailored for circular data. As in standard PAVA, the algorithm is applied sequentially and the solution at each stage depends on the geometry of the unrestricted estimates wrapped around the circle. The algorithm investigates the geometry of the estimates at each stage and updates the estimates accordingly. For some configurations, the solution is given by PAVA, for some the solution is given by the boundary points (0 or 2π), whereas for others the estimates are suitably pooled. Using simulation studies, we evaluate the performance of the CIRE in Section 4, and in Section 5 we illustrate the CIRE. Concluding remarks along with open research problems are discussed in Section 6. Proofs are in the Appendix and some supplementary details are in the accompanying supplementary materials.
2. ISOTONIC REGRESSION AND PAVA
2.1 PAVA for Euclidean Space Data: A Brief Review
Suppose φ̂ = (φ̂1, φ̂2, …, φ̂q)′ is an unrestricted estimator of a population parameter φ ∈ A ⊆ Rq, where A = {(φ1, φ2, …, φq)′ ∈ Rq: φ1 ≤ φ2 ≤ … ≤ φq}. Then the isotonic regression estimator is a point on A that is “closest” to φ̂, where the distance is the weighted Euclidean norm and wi are suitable weights. Thus, the isotonic regression estimator is a solution to the following minimization problem: . Because the distance is convex and is a well-defined norm, the solution is unique and is determined by the orthogonal projection of φ̂ onto A using PAVA (Robertson, Wright, and Dykstra 1988). According to PAVA, whenever two adjacent unrestricted estimates violate the hypothesized inequality, then such estimates are replaced by their weighted average. Note that for any two real numbers x, y (with x ≤ y), their weighted average always lies between x and y. This is known as the “Cauchy mean value” property (Robertson and Wright 1980) and it plays a crucial role when using PAVA for independent Rq data.
2.2 Notations and Some Underlying Geometry
Many of the basic concepts used for Rq data are not valid for circular data. For instance, the arithmetic mean of angular data may not be physically meaningful. Suppose two birds are flying east at angles, η1 = 0.52 radians (30°) and η2 = 5.76 radians (330°), respectively. Then the arithmetic mean is η̄ = π radians (i.e., 180°), suggesting that the birds on the average are actually flying westward, which refutes common sense. Instead, angular mean is 0 radians (i.e., 0°; see Fig. S-1 in supplementary materials).
More generally, suppose Sg = {θg1, θg2, …, θgng} are a random sample of ng observations from a population with a mean direction that is φg, g = 1, 2, …, q. Then the sample mean direction of Sg (see Mardia and Jupp 2000) is given by
| (1) |
where , , and the resultant length is , which is a measure of concentration (inverse of dispersion) of a dataset.
In the previous example, 0 does not lie between 0.52 and 5.76, and hence the angular mean does not always satisfy the Cauchy mean value property. As we see later, this causes difficulties in using PAVA. Throughout this article the term “PAVA” refers to the usual PAVA, with the exception that means are computed using (1), rather than the weighted arithmetic means.
Just as the arithmetic mean is not a suitable measure of average, the Euclidean distance (η1 − η2)2 between a pair of angles η1, η2 is not a suitable measure of distance for angular data. Suppose (η1, η2, η3) = (0, π, 2π), then (η1 − η3)2 > (η1 − η2)2, suggesting that 0 is closer to π than to 2π, which is counterintuitive. A natural distance for angular data is the “angular distance” d(η1, η2) = 1 − cos (η1 − η2) (Mardia and Jupp 2000). This measure is actually proportional to the Euclidean distance between a1 and a2, where ai = (cosηi, sinηi)′i = 1,2, because (a1 − a2)′ (a1 − a2) = 2(1−cos(η1 − η2)). Thus, the angular distance between α = (α1, α2, …, αq)′ and β = (β1, β2, …, βq)′ is , where . Note that for λ>0, ‖λδ‖2 ≠ λ2‖δ‖2; hence, ‖δ‖2 is not a norm.
Let Ave(S) = (Ave(S1), …, Ave(Sq))′ be the vector of sample mean directions. Then the distance between α and Ave(S) is called the sum of circular error (SCE) and is defined by
| (2) |
2.3 CIRE and RMLE
Let C = {x ∈ [0,2π]q: 0 ≤ x1 ≤ x2 ≤ … ≤ 2π} denote the simple order set. Then the problem of CIRE is to determine φ̃ ∈ C that minimizes (2). Equivalently, the problem is to determine φ̃ ∈ C, which is “closest” to the unrestricted estimator Ave(S). Because ‖δ‖2 is not a norm, hence, unlike the isotonic regression in the Euclidean case, it may not be possible to derive the CIRE using an orthogonal projection. Also, the CIRE is not unique, as noted in Example S-2 in the supplementary materials. This, together with the fact that the Cauchy mean value property is not satisfied by angular data, prompts us to develop an alternate strategy.
Suppose that θgj, g = 1, 2, …, q, j = 1, 2, …, ng are independently distributed according to von Mises distribution, represented by M(φg, κ), with mean direction φg and known concentration parameter κ. Then the log-likelihood function is given by
| (3) |
From (3), the unrestricted maximum likelihood estimator (UMLE) of φi, i = 1,2, …, q, is φ̂i = Ave(Si) (see Mardia and Jupp 2000). Hence, in the rest of this article we shall denote the unrestricted estimator of φi by φ̂i instead of Av(Si). Note that determination of the CIRE is equivalent to maximization of (3) under the constraint φ ∈ C, and hence when θgj are independently distributed according to von Mises distribution, the CIRE is also the RMLE. The von Mises distribution is briefly reviewed in the supplementary materials. Throughout this article, the word “pooling” refers to averaging the angles using (1). Also, we shall denote the PAVA estimator by φ̃PAVA and the CIRE by φ̃.
2.4 Existence and the Uniqueness of the CIRE
We now provide a characterization for CIRE, which serves as the building block for the computational algorithm described in Section 3.
Theorem 1
If φ ∈ C, then φ̃ exists. Furthermore, if φ̂ ∉ A, then φ̃ is defined by the blocks
| (4) |
where is a partition of {1, … q}, S(i) = {θgj, g ∈ (i), j = 1, …, ng} and φ̃g = φ̃S(i) = Ave(S(i)) with resultant length rS(i) for any g ∈ (i) and 1 ≤ i ≤ m.
Although the CIRE may not be unique (Ex. S-1), we demonstrate in Theorem 2 that it is almost surely unique under some conditions. In practice, if multiple solutions are realized, then one may select the most plausible solution for the application.
Theorem 2
If the unrestricted estimators are independent and are continuous random variables, then the CIRE is almost surely unique.
2.5 Some Properties of the CIRE and PAVA
Because of the challenges described in the previous section, it may not always be possible to derive the CIRE using PAVA. The focus of this section is to derive some properties of CIRE and PAVA for circular data that would be useful for the algorithm described in Section 3. In the following, we let . Also, for a pair of angles φg and φg′, arc(φg, φg′) denotes the simplex corresponding to the smallest arc joining them.
We consider two motivating examples. In Example 1(a) the CIRE is derived using PAVA because the Cauchy mean value property is satisfied here. In Example 1(b) the Cauchy mean value property is not true, and hence PAVA cannot be applied directly. However, if we rotate the unrestricted estimates from φ̂g to , then the Cauchy mean value property holds in the new orientation, and hence we apply PAVA. The resulting estimates are then retransformed to the original orientation. Such situations are characterized in Proposition 1, which suggests that the location of φ̂ is essential to identify the indices that can be pooled for obtaining the CIRE.
Example 1
Suppose n1 = n2 = n3 = 1 and φ̂ = (0.50, 3.20, 0.75). The violation of the inequality occurs between φ̂2 and φ̂3. Because the Cauchy mean value property holds in this case, φ̃PAVA = (0.50, 2.475, 2.475)′ = φ̃.
Suppose n1 = n2 = n3 = 1 and φ̂ = (0.50, 6.00, 1.75). Once again, φ̂2 and φ̂3 violate the inequality, and φ̃PAVA = (0.50, 0.73, 0.73)′ with SCE = 0.947, but the Cauchy mean value property is violated here. Rotating φ̂ by π, we have φ̂π = (3.64, 2.86, 4.89)′. Now and violate the order restriction; therefore, . By rotating back to the original orientation, we have , with SCE = 0.1514, which is less than the SCE of φ̃PAVA.
Proposition 1
- Suppose that 0 ∉ arc(φ̂g, φ̃g) and 0 ∉ arc(φ̂g+1, φ̃g+1), then
- Suppose π ∉ arc(φ̂g, φ̃g) and π ∉ arc(φ̂g+1, φ̃g+1), then
Although Proposition 1 provides an important characterization for implementing PAVA, as noted in Example 2, from one step of PAVA to the next, the arc (φ̂g*, φ̃g*) can change for the pooled value φ̂g* obtained by pooling original observations even though ∀g, 0 ∉ arc(φ̂g, φ̃g). Hence, PAVA may not result in the CIRE in such cases.
Example 2
Suppose n1 = n4 = 10, n2 = n3 = 1, θgj = φ̂g, j = 1, …, 10, g = 1, 4, and φ̂ = (3.1, 5.2, 1.6, 3.9). Then, φ̃PAVA = (3.086, 3.086, 3.086, 3.90) and φ̃ = (3.1, 3.922, 3.922, 3.922), with 0 ∉ arc(φ̂g, φ̃g), g = 1, 2, 3, 4.
Depending upon the geometry of the data, the CIRE can take values on the boundary such as 0 or 2π and not the angular mean (Ex. S-2). In Proposition 2, we provide the conditions under which φ̃g takes such values and not the angular mean.
Proposition 2
If π ≤ φ̂g < 2π and 0 ∈ arc(φ̂g, φ̃g), then φ̃g = φ̃g−1 where φ̃0 = 0.
If 0 < φ̂g ≤ π and 0 ∈ arc(φ̂g, φ̃g), then φ̃g = φ̃g+1 where φ̃q+1 = 2π.
From these results, we now identify situations where PAVA results in a CIRE.
Theorem 3
If 0 ∉ arc(φ̂g, φ̂g′) for all g, g′ ∈ {1, …, q} such that φ̃g = φ̃g′, then PAVA yields the CIRE.
Corollary 1
If, for all g = 1, …, q, φ̂g ∈ [η1, η2] such that η1 ≥ 0, η2 ≤ 2π, η2 − η1 ≤ π, then PAVA yields the CIRE.
Proposition 3 proves that we can obtain the CIRE by pooling the original observations step by step if it is known which inequalities φi ≤ φi+1 the CIRE verifies as equalities.
Proposition 3
Suppose that for a sample of angles φ̂i, i = 1, 2, …, q, it is known that φ̃g = φ̃g+1. Let φ̂i*, i = 1, 2, …, g, g + 2, … q, be another sample in Rq−1 defined as follows:
3. AN ALGORITHM TO CALCULATE THE CIRE
Based on the characterization of CIRE provided in Theorem 1, a simple-minded strategy would enumerate all possible solutions by pooling adjacent indices and/or suitably assigning 0 or 2π (Proposition 2), and then selecting that solution with the smallest SCE. Although such a strategy appears to be a simple task, the computational effort increases exponentially with the number of parameters, because there are 2q + 1 possible combinations of blocking adjacent indices. The proposed algorithm is computationally efficient, especially as the number of parameters increases, and results in the true CIRE. We have developed a freely downloadable SAS program.
We first provide a detailed explanation of the algorithm before summarizing all the steps. A flowchart is provided in the supplementary materials. The proposed algorithm is based on the characterization provided in Theorem 1 using a pair of indices (s, p), with 0 ≤ s < p ≤ q + 1 that verifies
| (5) |
and that defines three separate sets of indices as follows:
Our algorithm examines all possible selections of pairs (s, p) and, for each (s, p) choice, the best solution φ̃(s,p), if it exists, is obtained. Then the final solution is given by the pair (s, p) for which φ̃(s,p) has the smallest SCE.
We now describe the process of determining φ̃(s,p) for each pair (s, p). The idea is to consider only those poolings that may lead to the best solution.
For each , i = 1, 2, 3, the out-of-range angles are identified as follows: (a) but φ̂ g ∉ [0, π), (b) but φ̂ g ∉ [π/2, 3π/2], and (c) but φ̂g ∉ (π, 2π]. If there are no such angles, then Corollary 1 can be applied within each set to obtain φ̃(s,p). On the other hand, for an out-of-range angle φ̂g, we know φ̂g ≠ φ̃(s,p); therefore, it is pooled with adjacent angles or is set equal to 0 or 2π to derive φ̃(s,p). This is done using an efficient sequential examination designed specifically for each of the three sets. For simplicity, in the following we eliminate the superscript (s, p) for φ̃g and denote φ̃0 = 0 and φ̃q+1 = 2π.
For we begin with φ̂s and end with φ̂1. For indices such that φ̂g ∉ [0, π), the pooling actions in depends on whether 3π/2 < φ̂g < 2π or π < φ̂g ≤ 3π/2. In the former case using Proposition 2, we set φ̃g = φ̃g−1. In the latter case, three possible pooling actions are considered:
If φ̃g ≠ φ̂g+1, necessarily φ̂g must be pooled with angles having smaller indices until the new pooled angle satisfies .
If φ̃g−1 ≠ φ̃g, necessarily φ̂g must be pooled with angles having larger indices until the new pooled angle satisfies .
If φ̃g = φ̃g+1 = φ̃g−1, necessarily φ̂g must be pooled with φ̂g−1 and φ̂g+1.
For the other two sets, a similar strategy is designed. For indices in , we begin with φ̂s+1 and we end with φ̂p−1. If φ̂g ∉ [π/2, 3π/2], then we explore three possibilities (not necessarily disjoint):
-
φ̃g ≠ φ̃g+1. In this case φ̃gs = ··· = φ̃g, where
If such a gs does not exist, then this solution is rejected. Otherwise, new observations are generated as follows: .
-
φ̃g−1 ≠ φ̃g. In this case
If such a gi does not exist, then this solution is rejected. Otherwise, new observations are generated as follows: .
- φ̃g = φ̃g+1 = φ̃g−1. If g = s + 1 or g = p − 1, then this solution is rejected. Otherwise, new observations are generated as follows:
For indices in , we begin with φ̂ p and end with φ̂ q . For indices such that φ̂ g ∉(π, 2π], the pooling actions in depend on whether 0 < φ̂g <π/2 or π/2 < φ̂g < π.
If 0 <φ̂g <π/2, then set φ̃g = φ̃g+1. Because, in this case, φ̃g ∈ (3π/2, 2π], we therefore have 0 ∈ arc (φ̂g, φ̃g) and so the equality follows from Proposition 2(b). New observations are generated as follows: If φ̂g+1 = 2π, then ; otherwise,
On the other hand, if π/2 < φ̂g < π, then we explore three possibilities (not necessarily disjoint):
-
φ̃g ≠ φ̃g+1. In this case, φ̃gs = ··· = φ̃g, where gs =
If such a gs does not exist, then this solution is rejected. Otherwise, new observations are generated as follows:
-
φ̃g−1 ≠ φ̃g. In this case,If such a gi does not exist, then set . Otherwise, new observations are generated as follows:
- φ̃g = φ̃g+1 = φ̃g−1. If g = p or g = q, then this solution is rejected. Otherwise, new observations are generated as follows:
This procedure guarantees that ∀g, is “in range” and satisfies the sufficient conditions of Corollary 1. For each (s, p), we apply PAVA on these new “observations” φ̂* within each of the three sets. Proposition 3 guarantees that φ̃(s,p), if it exists, is the one obtained by concatenating those solutions that satisfy (5) and minimize the SCE in each set. Then the CIRE is one with smallest SCE within all acceptable φ̃(s,p). In the event of multiple local solutions, we select the one that that is practically most plausible. The partition [0,π/2), [π/2, 3π/2], (3π/2, 2π] has been selected because it exploits the results of Section 2.
We now summarize the algorithm. A step-by-step illustration is provided in the supplementary materials.
Step 0. Apply PAVA to obtain φ̃PAVA and set SCE = SCE(φ̃PAVA), φ̃ = φ̃PAVA.
Step 1. Select a pair of indices (s, p), such that 0 ≤ s < p ≤ q + 1. Let , , .
Step 2. Three sets of new subvectors are generated from φ̂ separately in each group , i = 1, 2, 3 defined in Step 1. This is done by sequentially examining the indices. As we have previously described, those indices that are “out of range” are pooled with other indices, following Proposition 2, and indices “in range” remain unchanged.
Step 3. Corresponding to , i = 1, 2, 3, for each new subvector obtained in Step 2, apply PAVA using Corollary 1. Now select only those PAVA-transformed subvectors with components that satisfy (5) and, among these, select the one that minimizes the SCE in each set. Then φ̃(s,p) is obtained by concatenating the best solutions from each set. If SCE(φ̃(s,p)) < SCE, then set SCE = SCE(φ̃(s,p)) and φ̃ = φ̃(s,p).
Step 4. Go to Step 1 and select the next acceptable pair (s, p). The algorithm concludes when all the pairs (s, p) have been examined.
4. SIMULATION STUDIES
4.1 Design of the Simulation Experiment
We generated q independent von Mises random variables φ̂ = (φ̂1, φ̂2, …, φ̂q)′ with corresponding mean directions given by φ = (φ1, φ2, …, φq)′ and a common κ. Hence, φ̂ is the UMLE and the CIRE φ̃ is also the RMLE. Motivated by quadratic risk for Euclidean space data, the mean circular error (MCE) of an estimator η̂ =(η̂1, η̂2, …, η̂q)′ for a parameter φ is defined as . Note that MCE(φ̂) = 1 − (I1(κ))/(I0(κ)), where In(z) is the modified Bessel function of order n, and hence MCE(φ̂) depends only on κ.
Because as κ → 0 the von Mises distribution approaches the uniform distribution on a circle, and as κ → ∞ it approaches the normal distribution, we considered κ = 0.5, 1, 10 to understand the impact of κ on the performance of RMLE. We also considered three patterns of q, namely, 3, 5, and 10. Although simulation studies for a variety of patterns of φ were conducted, in the interest of space, we present the results for φ1 = φ2 = ··· = φq (denoted by φ0), the most interesting case for RMLE. Results are summarized in Figure 2, with ordinates MCE(φ̂), MCE(φ̂) and abscissa φ0 ∈ [0, 6.2], with an increment of 0.1. The solid line represents MCE(φ̂). The values of MCE(φ̃) are represented by either longer dashed lines (q = 3), shorter dashed lines (q = 5), or dotted lines (q = 10). Each MCE value was computed using 5,000 simulation runs.
Figure 2.

MCE(φ̃) and MCE(φ̂), for a 3-, 5-, and 10-dimensional parameter when all coordinates are equal and for κ values 0.5, 1, and 10.
4.2 Results of the Simulation Experiment
As expected, for a given κ, MCE(φ̂) is constant in φ0 and q, but MCE(φ̃) is not. For larger values of κ and q, MCE(φ̃) is uniformly smaller than MCE(φ̂) for all φ0. As q increases, the performance of RMLE is better even for very small values of κ, For example, when q = 10 and κ = 5.0, MCE(φ̃) ≤ MCE(φ̂) (Fig. 2). However, this is not true for very small q and κ. For instance, if q = 3 and κ = 0.5, then MCE(φ̃) > MCE(φ̂) for 1.75 ≤ φ0 ≤ 4.5 Perhaps φ̃ performs poorly as κ → 0 because the von Mises distribution in this case converges to uniform distribution on a circle, which is not informative about φ0. In our applications, based on the results of Liu et al. (2004), we expect κ > 10. Hence, we expect RMLE to perform well in practice.
5. ILLUSTRATION
We illustrate our methodology using the unrestricted estimates of the phase angles obtained by Liu et al. (2004) for data by Whitfield et al. (2002) labeled “Thythy3” in the database www.cyclebase.org/. Liu et al. (2004) estimated the phase angles of CCNE1, RRM2, CCNA2, CCNB1, and VEGFC as 0.56, 5.36, 3.55, 2.67, and 2.66 radians, respectively. These unrestricted estimates do not satisfy 0 ≤ φCCNE1 ≤ φRRM2 ≤ φCCNA2 ≤ φCCNB1 ≤ φVEGFC ≤ 2π.
In addition to the CIRE, for comparison purposes we also implemented the standard PAVA on the unrestricted estimates. Results are summarized in Table 1. As expected, the PAVA estimate has a larger SCE than the CIRE.
Table 1.
Estimates of phase angles and phase classifications from different procedures.
| Unrestricted Estimates | PAVA | CIRE | |||||
|---|---|---|---|---|---|---|---|
|
|
|
|
|||||
| Gene | Phase* | Phase | Angle | Phase | Angle | Phase | Angle |
| CCNE1 | G1/S(W, S) | S | 0.56 radians 1 hr 12 min | S | 0.56 radians 1 hr 12 min | S | 0 |
| RRM2 | S(G, W, S) | G1 | 5.36 radians 13 hr 8 min | M | 3.27 radians 8 hr 1 min | S | 0 |
| CCNA2 | G2(G,W,S) | G1 | 3.55 radians 8 hr 42 min | M | 3.27 radians 8 hr 1 min | M | 2.95 radians 7 hr 14 min |
| CCNB1 | G2/M(W) M(S) | G2 | 2.67 radians 6 hr 32 min | M | 3.27 radians 8 hr 1 min | M | 2.95 radians 7 hr 14 min |
| VEGFC | M(G) M/G1 (W, S) | G2 | 2.66 radians 6 hr 31 min | M | 3.27 radians 8 hr 1 min | M | 2.95 radians 7 hr 14 min |
| Sum of Circular Error (SCE) | 1.8902 | 0.8048 | |||||
NOTE: All angles are estimated from the beginning of the S phase.
G, Gauthier; W, Whitfield; S, Datta and Sokhansanj.
From Liu et al. (2004) we know that the approximate period of the cell cycle is about 15 hours 24 minutes. Thus, each radian represents approximately 2 hours 27 minutes. According to Gauthier et al. (2008) (www.cyclebase.org/), for the data discussed in this section, the S phase lasts approximately 3 hours 33 minutes, the G2 phase is about 3 hours 5 minutes, the M phase is about 1 hour 32 minutes, and the G1 phase is about 7 hours 14 minutes. The phases and the duration of time, along with the cumulative time from the S phase are summarized in Table 2. Using these estimates we converted our phase angle estimates into cell cycle phases. Thus in Table 1, corresponding to each estimation procedure, we provide the estimated phase angles as well as the estimated phases of each gene. In column 2 we provide the phase of each gene as determined by Gauthier et al. (2008) (denoted as G), Datta and Sokhansanj (2007) (denoted by S), and those by Whitfield et al. (2002) (denoted by W). For some genes, Gauthier et al. (2008) did not provide their estimates of the phase, and in such cases G is missing. Results are displayed graphically in Figure 3.
Table 2.
Duration of time spent in each phase of the cell cycle in the double thymidine block experiment of Whitfield et al. (2002)
| Phase | Duration of Time in the Phase | Time Interval |
|---|---|---|
| S | 3 hr 33 min | 0–3 hr 33 min |
| G2 | 3 hr 05 min | 3 hr 33 min–6 hr 38 min |
| M | 1 hr 32 min | 6 hr 38 min–8 hr 10 min |
| G1 | 7 hr 14 min | 8 hr 10 min–15 hr 24 min |
NOTE: Data provided by N. Gauthier.
Figure 3.

(a) Unrestricted estimates from Liu et al. (2004). (b) PAVA. (c) CIRE. Dotted arrows point to the estimates and the solid lines are the boundaries between phases. Angles are in radians reported in Table 1. Phase angle estimates are in parentheses.
Consistent with earlier findings reported in column 2, all estimation procedures assigned CCNE1 to the S phase. However, the three estimation procedures differed on the phase assignment of RRM2, with the CIRE’s assignment agreeing with that in column 2. A surprising result was noted with respect to CCNA2. Gauthier et al. (2008), Datta and Sokhansanj (2007), and Whitfield et al. (2002) assigned this gene to the G2 phase, whereas CIRE assigned CCNA2 to the M phase. Based on the expression profile of CCNA2 (Fig. S-5 in the supplementary materials) across three periods (a total of about 46 hours), it appears that the peak expression of CCNA2 is more likely to be during the M phase than during the G2 phase. Thus, based on the mRNA data of Whitfield et al. (2002) and the unrestricted phase angle estimates of Liu et al. (2004), our phase assignment of CCNA2 seems to be more reasonable than that of Gauthier et al. (2008), Datta and Sokhansanj (2007), and Whitfield et al. (2002). The result of PAVA for gene RRM2 appears to be inconsistent with CIRE. PAVA assigns RRM2 to the M phase whereas the CIRE agrees with Gauthier et al. (2008) and Whitfield et al. (2002).
Using the strategy described in the Introduction, the phase angle estimates under the rotation-invariant simple order restriction for the five genes, CCNE1, RRM2, CCNA2, CCNB1 and VEGFC are 6.10, 6.10, 2.95, 2.95 and 2.95, respectively.
6. CONCLUDING REMARKS AND OPEN PROBLEMS
This article takes the first step towards statistical inference on angular parameters when they are ordered around a unit circle. We have developed the theory and methodology for computing the circular isotonic regression estimators when the components of the angular parameter φ are constrained by a simple order restriction on a unit circle.
There are several problems that remain to be addressed and serve as topics for future research. For instance, it would be useful to explore theoretical properties of the proposed estimation procedures along the lines of Rueda, Salvador, and Fernández (1997). Other interesting research questions include the determination of standard errors for the CIRE and the associated confidence intervals using resampling procedures such as those explored in Rueda, Menéndez, and Salvador (2002) for Rq data.
A cell biologist may be interested in testing the hypothesis that a subset of cell cycle genes followed a particular order around the circle during cell division. Before deriving the suitable test statistic and its null distribution, an important aspect of this problem is the formulation of suitable null and alternative hypotheses.
As commonly done for Euclidean space data, it would be useful to develop methods for estimating angular parameters by combining data from multiple laboratories/experiments, while accounting for variation among experiments. For example recently three different laboratories (Rustici et al. 2004; Oliva et al. 2005; Peng et al. 2005) conducted a total of ten different yeast cell cycle experiments under two different platforms. The experimental conditions among the 10 experiments are not necessarily identical. Consequently, it may not be reasonable to assume that the concentration parameters of the phase angles from such experiments are the same (and known). It would be useful to derive rotation-invariant simple order restricted estimators using data from such heterogeneous experiments.
In summary, we believe there are several theoretical problems in this line of research that may have applications to cell biology, and possibly other applied fields.
Supplementary Material
Acknowledgments
Research of Rueda and Fernández was supported in part by Spanish DOES (grant MTM2004-07740) and by PAPIJCL (grant VA047A05); and Peddada’s research was supported by the Intramural Research Program of the National Institutes of Health, National Institute of Environmental Health Sciences [Z01 ES 101744-04]. The authors thank Drs. Dunson, Umbach, Weinberg, and Guo; two referees; the associate editor; and the editor for numerous suggestions that improved the presentation of the manuscript. They specifically thank Drs. Umbach and Weinberg for suggesting that they extend their methodology to rotation-invariant simple order around the circle, a biologically important order restriction. The authors also thank Dr. Gauthier, Technical University of Denmark, for the permission to use the figure from www.cyclebase.org, and for providing the duration of phases in Whitfield’s data.
APPENDIX: PROOFS
For u = (u1, u2, …, uq)′, φ̂ = (φ̂1, φ̂2, …, φ̂q)′, we define . Let C = {x ∈[0, 2π]q: 0 ≤ x1 ≤ x2 ≤ ··· ≤ xq ≤ 2π} denote the simple order set and let CJ = {x ∈ [0, 2π]q: xi = xi+1 if i ∈ J, and xi < xi+1 if i ∈ I–J} denote a face C. The “relative interior” of C is CI = {x ∈ [0, 2π]q: 0 < x1 < x2 < ··· < xq < 2π} with the boundary of C being CB = C − CI = ∪j⊂iCj. First we prove the following lemma which will be useful for proving the other results.
A.1 Lemma 1
Let u = (u1, u2, …, uq)′, v = (v1, v2, …, vq)′, and φ̂ = (φ̂1, φ̂2, …, φ̂q)′. Suppose, for i = 1, 2, …, q, ui, vi, and φ̂i ∈[0, 2π], then the following holds:
- if for all i = 1, 2, …, q,
- If for all i = 1, 2, …, q,
In both cases (a) and (b), G(φ̂, v) = G(φ̂, u) ⇔ v = u
Proof
Proof follows from the fact that cos(x) decreases in x if ∈ [−π, π].
Proof follows from the fact that cos(x) increases in x if x ∈ [π, 2π] or x ∈ [−2π, −π].
Clearly, if u = v, then G(φ̂, v) = G(φ̂, u). To prove the converse, suppose there exists an index i such that ui ≠ vi and cos(φ̂i − vj) < cos(φ̂i − ui). This implies that , which is a contradiction. Hence, it is necessary that u = v.
A.2 Proof of Theorem 1
The problem is to find the minimum of a continuous function in a closed and convex set in Rq. From standard results in real analysis we know that the function has an upper bound and attains it. We first prove that the minimum is attained on the boundary of the convex cone C. Let x0 = 0 and xq+1 = 2π. Assume that the minimum φ̃ is attained in CI and that φ̂ ∉ C. Let v = φ̃ − φ̂. Then we demonstrate that none of the following three possibilities can be true:
−π<vi<π, ∀i = 1, …, q.
There exists an index i such that vi = φ̃i − φ̂i > − π.
There exists an index i such that vi = φ̃i − φ̂i < − π.
If (a) is true then there exists a λ between 0 and 1 such that b = φ + λv ∈ CB. From Lemma 1, we have G(φ̂, b) < G(φ̂,φ̃), but this contradicts the fact that φ̃ is the CIRE. Suppose (b) is true. Then we have φ̃i+1 − φ̂i > φ̃i − φ̂i > π. Now consider a new point bg = φ̃g if g ≠ i, else bg = φ̃i+1, then clearly b ∈ CB, and from Lemma 1 we must have G(φ̂, b) < G(φ̂, φ̃), which again contradicts the fact that φ̃ is the CIRE. Similar to (b), we can also demonstrate that (c) cannot be true. Next we prove that φ̃S(i) = Ave(s(i)). We begin by showing that
| (A.1) |
If (A.1) were false, then it is necessary that one of the following four statements is true:
Ave(S(i)) < φ̃S(i−1) and φ̃S(i) − Ave(S(i)) < π.
Ave(S(i)) > φ̃S(i+1) and φ̃S(i) − Ave(S(i)) > π.
Ave(S(i)) < φ̃S(i−1) and φ̃S(i) − Ave(S(i)) > π.
Ave(S(i)) > φ̃S(i+1) and φ̃S(i) − Ave(S(i)) < π.
In the following we demonstrate that none of these four statements can be true. Suppose (a) is true. Define bg = φ̃S(i−1) if g ∈ S(i), else define bg = φ̃g, then we have b ∈ C. Now, applying Lemma 1, we have G(φ̂, b) < G(φ̂, φ̃), which is a contradiction. Using a similar argument and the same vector b, we can prove that (b) cannot be true either. Letting eg = φ̃S(i+1), if g ∈ S(i), else letting eg = φ̃g, then we have e ∈ C. Using an argument similar to those used in (a) and (b), we also prove that cases (c) and (d) do not hold. If g ∈ S(i), then let φ̆g = Ave(S(i)); otherwise, let φ̆g = φ̃g then
b From (A.1) we have φ̆ ∈ C; hence, φ̃ = φ̆ and Ave(S(i)) = φ̃S(i).
A.3 Proof of Theorem 2
Let LJ be the biggest subspace contained on a face CJ of C. Then the minimization problem arg minφ ∈LJ G(φ̂,φ) has a unique solution (Mardia and Jupp 2000). Suppose that there are a total of r different subspaces (Li, i = 1, …, r) associated with faces of C denoted by , i = 1, 2, …, r. Then Pr[∃i, i′∈{1, … r} with Gi(φ̂) = Gi′(φ̂)] = Pr[∪i,i′(Gi(φ̂) = Gi′ (φ̂))] ≤ ∑i,i′ Pr[Gi(φ̂) = Gi′ (φ̂)] = 0.The sum has a finite number of summands and the sets inside the brackets have probability zero because they are Lebesgue measure zero sets and hence the result follows.
A.4 Proof of Proposition 1
We prove (a), the proof of (b) follows similarly. If φ̃g < φ̃g+1, then consider the following possibilities: (i) φ̂g ≤ φ̃g, (ii) φ̂g > φ̃g, φ̂g ≤ φ̃g+1, and (iii) φ̂g > φ̃g, φ̂g ≥ φ̃g+1. We prove that none of these three are true. Suppose (i) is true. Then φ̂g ≤ φ̃g ⇒ φ̂g+1 < φ̂g ≤ φ̃g < φ̃g+1 and φ̃g − φ̂g+l < φ̃g+1 − φ̂g+1< π.
The last inequality in the previous expression is a consequence of 0 ∉ arc(φ̃g+1, φ̂g+1). Define bi = φ̃g+1, if i = g, else define bi = φ̃i, then b ∈ C. Now, applying Lemma 1, we have G(φ̂, b) < G(φ̂, φ̃), which is a contradiction.
Suppose (ii) is true; then, φ̃g < φ̂g < φ̃g+1. Letting bi = φ̂g, if i = g else letting bi = φ̃i and using a similar argument as in (i), we conclude that (ii) cannot be true either. Finally, suppose (iii) is true. Then we have φ̃g ≥ φ̃g+1 > φ̃g and 0 <φ̂g − φ̃g+1 < φ̂g − φ̃g < π. The last inequality is as result of the fact that 0 ∉ arc(φ̃g, φ̂g). Let bi = φ̃g+1, if i = g, else let bi = φ̃i, then again from Lemma 2.1 we have G(φ̂, b) < G(φ̂, φ̃), which is a contradiction.
A.5 Proof of Proposition 2
We prove (b), the proof of (a) follows similarly. Suppose that π ≤ φ̂g < 2π, 0 ∈ arc (φ̂g, φ̃g) and φ̃g > φ̃g−1, then 0 < φ̂g − φ̃g−1 < φ̂g − φ̃g < π. Let bi = φ̃g−1, if i = g, else let bi = φ̃i, then b ∈ C. Applying Lemma 1, we have G(φ̂, b) < G(φ̂, φ̃), which is a contradiction.
A.6 Proof of Theorem 3
The proof of Theorem 3 follows by applying Proposition 1 (a) sequentially to each pair of indices in a reverse order.
A.7 Proof of Corollary 1
The result follows from Theorem 2.9 as
A.8 Proof of Proposition 3
If φ̃g = φ̃g+1, then
where φ̂g* = Ave(θij, i = g, g + 1;j = 1, …, ni), ng* = ng + ng+1, φ̂i* = φ̂i ∀i ≠ g, g + 1, rg* = The resultant length of the angles (θij, i = g, g + 1;j = 1, …, ni), ri* = ri, ∀i ≠ g, g + 1. Hence the result follows.
Contributor Information
Cristina Rueda, Department of Statistics and Operations Research, University of Valladolid, Valladolid, Spain.
Miguel A. Fernández, Department of Statistics and Operations Research, University of Valladolid, Valladolid, Spain
Shyamal Das Peddada, Biostatistical Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709 (peddada@niehs.nih.gov).
REFERENCES
- Akashi M, Ichise T, Mamine T, Takumi T. Molecular Mechanism of Cell-Autonomous Circadian Gene Expression of Perio2, a Crucial Regulator of the Mammalian Circadian Clock. Molecular Biology of the Cell. 2006;17:555–565. doi: 10.1091/mbc.E05-05-0396. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bjorklund S, Skog S, Tribukait B, Thelander L. S-Phase-Specific Expression of Mammalian Ribonucleotide Reductase R1 and R2 Subunit mRNAs. Biochemistry. 1990;29:5452–5458. doi: 10.1021/bi00475a007. [DOI] [PubMed] [Google Scholar]
- Datta S, Sokhansanj B. Accelerated Search for Biomolecular Network Models to Interpret High-Throughput Experimental Data. BMC Bioinformatics. 2007;8:258. doi: 10.1186/1471-2105-8-258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gauthier NP, Larsen ME, Wernersson R, de Lichtenberg U, Jensen LJ, Brunak S, Jensen TS. Cyclebase.org: A Comprehensive Multi-organism Online Database of Cell Cycle Experiments. Nucleic Acids Research. 2008;36(database issue):D854–D859. doi: 10.1093/nar/gkm729. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu D, Umbach D, Peddada S, Li L, Crockett P, Weinberg C. A Random Periods Model for Expression of Cell Cycle Genes. Proceedings of the National Academy of Sciences of the United States of America. 2004;101:7240–7245. doi: 10.1073/pnas.0402285101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mardia K, Jupp P. Directional Statistics. Chichester: Wiley; 2000. [Google Scholar]
- Nakamura T, Sellix M, Menaker M, Block G. Estrogen Directly Modulates Circadian Rhythms of PER2 Expression in the Uterus. American Journal of Physiology, Endocrinology and Methabolism. 2008;295:E1025–E1031. doi: 10.1152/ajpendo.90392.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oliva A, Rosebrock A, Ferrzuelo F, Pyne S, Chen H, Skiena S, Futcher B, Leatherwood J. The Cell Cycle-Regulated Genes of Schizosaccharomyces pombe. PLoS Biology. 2005;3:e225. doi: 10.1371/journal.pbio.0030225. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peng X, Karuturi R, Miller L, Lin K, Jia Y, Kondu P, Wang L, Wong L, Balasubramanian M, Liu J. Identification of Cell Cycle-Regulated Genes in Fission Yeast. Molecular Biology of the Cell. 2005;16:1026–1104. doi: 10.1091/mbc.E04-04-0299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pines J, Hunter T. Isolation of a Human Cyclin cDNA: Evidence for Cyclin mRNA and Protein Regulation in the Cell Cycle and for Interaction With p34cdc2. Cell. 1989;58:833–846. doi: 10.1016/0092-8674(89)90936-7. [DOI] [PubMed] [Google Scholar]
- Pines J, Hunter T. Human Cyclin A Is Adenovirus ElA-Associated Protein p60 and Behaves Differently From Cyclin B. Nature. 1990;346:760–763. doi: 10.1038/346760a0. [DOI] [PubMed] [Google Scholar]
- Rueda C, Menéndez JA, Salvador B. Bootstrap Adjusted Estimators in a Restricted Setting. Journal of Statistical Planning and Inference. 2002;107:123–131. [Google Scholar]
- Rueda C, Salvador B, Fernández MA. Simultaneous Estimation in a Restricted Linear Model. Journal of Multivariate Analysis. 1997;61:61–66. [Google Scholar]
- Rustici G, Mata J, Kivinen K, Lio P, Penkett C, Burns G, Hayles J, Brazma A, Nurse P, Bahler J. Periodic Gene Expression Program of the Fission Yeast Cell Cycle. Nature Genetics. 2004;36:809–817. doi: 10.1038/ng1377. [DOI] [PubMed] [Google Scholar]
- Robertson T, Wright FT. Algorithms in Order-Restricted Statistical Inference and the Cauchy Mean Value Property. The Annals of Statistics. 1980;8:645–651. [Google Scholar]
- Robertson T, Wright FT, Dykstra RL. Order Restricted Statistical Inference. New York: Wiley; 1988. [Google Scholar]
- Silvapulle M, Sen PK. Constrained Statistical Inference. New York: Wiley; 2005. [Google Scholar]
- Van Eeden C. Restricted Parameter Space Estimation Problems. New York: Springer; 2006. [Google Scholar]
- Whitfield ML, Sherlock G, Saldanha A, Murray J, Ball C, Alexander K, Matese J, Perou C, Hurt M, Brown P, Botstein D. Identification of Genes Periodically Expressed in the Human Cell Cycle and Their Expression in Tumors. Molecular Biology of the Cell. 2002;13:1977–2000. doi: 10.1091/mbc.02-02-0030.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xiao E, Xia-Zhang L, Barth A, Zhu J, Ferin M. Stress and Menstrual Cycle: Relevance of Cycle Quality in the Short- and Long-Term Response to a 5-Day Endotoxin Challenge During the Follicular Phase in the Rhesus Monkey. Journal of Clinical Endocrinology. 1998;88:2454–2460. doi: 10.1210/jcem.83.7.4926. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
