Abstract
We formulate the estimation of monotone response surface of multiple factors as the inverse of an iteration of partially ordered classifier ensembles. Each ensemble (called PIPE-classifiers) is a projection of Bayes classifiers on the constrained space. We prove the inverse of PIPE-classifiers (iPIPE) exists, and propose algorithms to efficiently compute iPIPE by reducing the space over which optimisation is conducted. The methods are applied in analysis and simulation settings where the surface dimension is higher than what the isotonic regression literature typically considers. Simulation shows iPIPE-based credible intervals achieve nominal coverage probability and are more precise compared to unconstrained estimation.
Keywords: Clinical decision support tool, partial ordering, posterior quantiles, sweep algorithm, weighted posterior gain
1 |. INTRODUCTION
In clinical studies and health systems, interventions and patient conditions are often defined by multiple factors. To assess the total effect of an intervention or a condition, we can estimate the response surface as a multivariate function of the individual factors. In this article, we focus on monotone response surface, which is a reasonable assumption in many applications such as dose-response studies and clinical decision making. Specifically, consider a study with multi-factor conditions. Let denote the th condition, for , where indicates the state of the th factor, and denote the parameter of interest associated the condition. We are concerned with the estimation of monotone response surface , where the number is large in many applications. We assume without loss of generality that is nondecreasing in in terms of the partial Euclidean ordering (≻): if , then , where denotes the event for each component with at least one strict inequality.
To motivate our work, consider a recent Delphi study where an expert panel identified important factors that influence the selection of postacute care for stroke patients (Stein et al., 2022), including four main factors: likelihood of benefitting from active rehabilitation (factor 1), need for clinicians with specialised rehabilitation skills (factor 2), need for ongoing medical and nursing care (factor 3), and patient’s ability to tolerate rehabilitation (factor 4); and three minor factors: family/caregiver support (factor 5), likelihood of return to community (factor 6), and ability to return to physical home (factor 7). While the presence of each of these factors increases the likelihood of referral to an inpatient rehabilitation program, we currently plan to conduct chart reviews to understand the combined effects of these factors and develop a clinical decision support tool. In each patient chart, a main factor will be scored as 0, 1, and 2 respectively for the answer “no”, “uncertain”, and “yes”, and a minor factor will be scored as 0 for “no/uncertain” and 1 for “yes”. The outcome will be noted as whether the patient was referred to rehabilitation. In summary, each condition consists of factors with each taking on two or three possible values and there are a total of conditions. Unconstrained estimate of each , the underlying referral probability associated with condition , can be obtained based on the number of patients under condition and the number of patients referred to rehabilitation among them, for . In the simple cases of estimating population parameters associated with different conditions, we will use , to generically denote the data associated with condition with distribution where the nuisance parameter may be shared across conditions. Notations for increasingly complicated model set up will be defined and explained for specific applications in Section 4.
There is a large literature on monotone or isotonic regression, which can be formulated as a restricted least squares optimisation problem and can be solved using the pool-adjacent-violators-algorithm (PAVA); see Brunk (1955), Ayer et al. (1955), Barlow et al. (1972), and Robertson et al. (1988). Numerous approaches have been proposed to deal with multivariate isotonic regression, including additive models (Bacchetti, 1989; Morton-Jones et al., 2000), spline methods using monotone basis functions (Ramsay, 1988; Leitenstorfer and Tutz, 2007), Bayesian mixture modeling (Bornkamp et al., 2010), and projective Gaussian process (Lin and Dunson, 2014). These methods deal with situations with continuous functions, where additional assumptions such as piecewise linearity, additivity, and smoothness are used to make computations tractable. A monotone response surface defined over a continuous support may also be estimated using tensor-product splines with monotone basis functions on the margins and appropriate constraints on the coefficients of the basis functions; see Wang and Taylor (2004) for an application that uses -spline bases. As such, computations can be formulated as a convex optimisation problem for which statistical packages such as the R package CVXR (Fu et al., 2020) can implement quite efficiently. Isotonic regression has also been recently studied for survival analysis. Chung et al. (2018) propose a pseudo-iterative convex minorant algorithm, with theoretical justifications, to implement PAVA and maximise the partial likelihood under isotonic proportional hazards models for right-censored data. While their approach exhibits computational stability under a piecewise constant assumption, the algorithm is applied to a single continuous covariate and focuses on point estimation. Generally, optimisation and inference of isotonic regression can become complex and challenging as the dimension increases, while most methods have been demonstrated in problems with relatively low dimension ( to 4).
Our work is motivated by a number of considerations that render the above mentioned approaches not directly applicable. First, we focus on applications where the response surface is observed on discrete levels per factor with a moderate-to-high number of factors. While Wright (1982) studies maximum likelihood estimation of a univariate function observed on discrete levels, there is relatively little discussion on isotonic regression for multiple discrete factors. Second, we adopt a Bayesian decision-theoretic framework to deal with inference including interval estimation. For multivariate isotonic functions of discrete factors, a pragmatic Bayesian approach will first draw from the posterior distribution based on an unconstrained model and then include only draws that meet partial ordering; see Holmes and Heard (2003) for example. As will be illustrated, the constrained posterior thus obtained may cause bias and its feasibility is limited as increases. Third, to ensure broad applicability, we aim to develop a general approach that can work with different statistical and regression models, including generalised linear models, mixed effects models, Bayesian hierarchical models, and survival models.
To address the above considerations, in this article, we propose estimation of the response surface by inverting an iterated sequence of partially ordered classifier ensembles, each of which is solved by projecting unconstrained Bayes classifiers onto the space constrained by partial ordering. The proposed classifier ensemble may be viewed as an extension of the product-of-independent-probability-escalation (PIPE) method by Mander and Sweeting (2015) for two-dimensional dose-response, and will thus be called PIPE-classifiers; and the estimator for obtained by its inverse will be called iPIPE. Cheung et al. (2022) develop a decision-theoretic framework to motivate PIPE and outline the principle of extension to dimension without examining its computational feasibility. In this article, we will propose efficient computation algorithms that solve PIPE-classifiers and iPIPE simultaneously and demonstrate its feasibility for high dimension problems.
2 |. METHODS
2.1 |. A partially ordered classifier ensemble
We first introduce a classification problem of condition with respect to some threshold . Define the classifier ensemble where and is an indicator function. As a nondecreasing function in is also nondecreasing in in terms of partial ordering. We let denote the constrained space where lives. More generally, define
| (1) |
Then . Further let denote a subvector of on and it is easy to see that implies for any .
We consider estimation by maximising the objective function defined as follows:
| (2) |
where is the expectation of taken with respect to the posterior distribution of (to be elaborated in Section 4), and the weight is chosen to reflect the information content about condition and some . For brevity, we suppress the dependence of on in our notation. The subscript will also be omitted when , e.g., writing as , as , etc.
Proposition 1. is a weighted product of posterior gains over with respect to the gain function
| (3) |
Proposition 1 can be easily verified by taking expectations of the right-hand side of (3) with respect to the posterior distribution of for each . Under this framework, may be viewed as a decision parameter that defines the relative gains of a true negative and a true position decision (Cheung et al., 2022), and may be used to control classification errors.
When we estimate an individual we can also show is a Bayes estimator for in that it maximises the gain function (3).
Proposition 2. If , then is maximised at .
Proposition 2 can be verified by observing that maximises the th factor in the product in (2) so that
| (4) |
Combining (2) and (4) and setting , we can write
| (5) |
As such, the classifier ensemble may be viewed as a projection of the Bayes classifier ensemble on the constrained space . Proposition 2 implies that if is evaluated under a joint posterior of with support that satisfies partial ordering, the estimator will be its own projection. To ensure , one could use parametric models such as linear additive models to impose monotonicity. Alternatively, motivated by ease and scalability of independently computing unconstrained and keeping model assumptions to a minimum, we propose applying (2) in conjunction with unconstrained distribution of . The estimator thus resulted will be called a PIPE-classifier, as coined in Mander and Sweeting (2015) who introduce the special case of (2) with and for binomial outcomes over two-dimensional grid.
2.2 |. Inverting PIPE-classifiers (iPIPE)
In this subsection, we return to the estimation of . Viewing the estimand as an inverse of , we may write
| (6) |
The last equality in (6) holds because is nonincreasing in , and as such, its inverse is unambiguously defined. If a PIPE-classifier is nonincreasing in , an estimator for can be analogously defined as its inverse:
| (7) |
Lemma 1. Partition into and for a given . Define
| (8) |
Then . That is, for all , where is the th element of .
Theorem 1. Let denote the th element of defined in (2). If , then for .
Theorem 1 shows that is nonincreasing in , and as a result, the inverse of a PIPE-classifier (7) is well-defined. In addition, the following result provides the basis of choosing for point and interval estimation.
Proposition 3. If for all , then is equal to the -quantile of the posterior distribution of .
Generally, we do not expect unless in special cases such as when using parametric models as discussed after Proposition 2. Proposition 3 however provides an interpretation of in the context of estimation. Specifically, we may consider with , i.e., posterior median, as a point estimate for , and obtain a 95% credible interval by evaluating with .
The proofs of Lemma 1, Theorem 1, and Proposition 3 are given in the Appendix.
3 |. COMPUTATION ALGORITHMS
3.1 |. A sweep algorithm
The main motivation for using the PIPE-classifier (2) is that each individual factor in the product can be easily and quickly computed without consideration of partial ordering; thus computations can be scaled to deal with problems with complex partial ordering structure as increases. When the dimension and number of conditions are small, one can evaluate the set by brute force, i.e., enumerating each and check if it belongs to . Given is determined, the additional computational cost of (2) is . Enumerating the entire set , however, becomes infeasible quickly as and increase.
Suppose there exists such that for all and all , and such that for all and . Under this assumption, Proposition 2 implies that and because . Then the following sweep algorithm solves the inverse of PIPE-classifiers, or iPIPE, without the needs for enumerating :
Iterate from to .
- For each ,
- Identify subset: Let and . Define , where the index set for that has been determined 0 and is initially a null set.
- Maximise: Evaluate the PIPE-classifiers and set equal to on the subset . Set for remaining .
- Sweep zeros: For all with , set for all and add these indices to .
Stop when for all , and evaluate according to (7).
For the sweep algorithm to yield the true maximiser for all , it requires:
Proposition 4. Step 2b yields the true maximiser of for the given .
As a consequence of Proposition 4 and Theorem 1, sweeping zeroes across the remaining (step 2c) yields the correct for all .
The core idea of the algorithm is to break down the maximisation problem into maximisation over subsets . In particular, since we start with a small value of , the set and hence are small at the beginning. As the algorithm iterates across , the set of determined zeroes increases thus limiting the size of and rendering the maximisation step feasible. Specifically, the computational cost in step 2b for each is provided that is determined.
Note that as an easy corollary to Theorem 1, if , then for . Thus, one can define an analogous algorithm that starts with and sweep ones in the opposite direction.
3.2 |. Numerical illustration: clinical decisions for rehabilitation
We illustrate the sweep algorithm using a simulated data set in the context of evaluating clinical decisions for referring stroke patients to rehabilitation described in Section 1. For brevity in presenting the results, we consider only the four main factors in this subsection, i.e., having and . Even in this reduced problem, there are 281 possible classifier ensembles without the partial ordering constraint and enumerating the partially ordered set from the unconstrained space will be computationally prohibitive.
In the simulated data set, for each condition, we first generated the number of patients with that condition and then generated the number of patients referred to rehabilitation given , i.e., where was the referral probability for condition . To analyse the simulated data, we postulated a uniform prior on for all , so that the unconstrained posterior distribution was based on which each was to be evaluated.
While the simulated data along with the specifications of and and the analysis results are provided in the web-based supporting materials, Table 1 gives some intermediate steps of the sweep algorithm applied to the simulated data with , described as follows:
We first determine the lower limit as the largest threshold (to the third decimal place) so that for all . We then iterate on a grid with increments 0.001.
When , the condition is associated with , and thus belongs to ; see first row under the column in Table 1. By partial ordering, the conditions (0, 0, 0, 0), (0, 1, 0, 0), and (0, 0, 0, 1) are included in ; rows 2–4 in the table. Thus, the set consists of 4 conditions. Applying (2) on the subset gives for all .
When , the set remains the same, although the condition (0, 0, 0, 0) now belongs to and its associated by maximising over . As a result, this condition is added to and its associated would be set at 0 for all .
When , the set consists of 3 conditions, all of which are determined to have in the maximisation step, and are added to .
Table 1 further gives the intermediate results for to illustrate how the set changes over the iteration. Overall, while the number of conditions belonging to and grows as grows, the set also grows. The size of in this example ranges from 0 to 9 for all . Thus, the maximisation step (step 2b) is feasible computationally.
Note that the table only shows conditions that belong to at a given . Because for , they are not shown in the table to conserve space.
The iteration stops at when for all .
TABLE 1.
Illustration of the sweep algorithm applied to simulated data with outcomes for condition . For each , conditions that belong to are shown (under the columns ‘set’) along with the PIPE-classifier per step 2b.
| set | set | set | set | set | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| (0,1,0,1) | 34 | 0 | 1 | 1 | 0 | 0 | 0 | |||||
| (0,0,0,0) | 23 | 0 | 1 | 0 | 0 | 0 | 0 | |||||
| (0,1,0,0) | 17 | 0 | 1 | 1 | 0 | 0 | 0 | |||||
| (0,0,0,1) | 26 | 1 | 1 | 1 | 0 | 0 | 0 | |||||
| (0,0,1,2) | 15 | 0 | 1 | 1 | ||||||||
| (0,0,1,0) | 22 | 1 | 1 | 1 | ||||||||
| (0,0,1,1) | 1 | 0 | 1 | 1 | ||||||||
| (0,0,0,2) | 29 | 1 | 1 | 1 | ||||||||
| (1,0,0,0) | 14 | 0 | 0 | |||||||||
| set | set | set | ||||||
|---|---|---|---|---|---|---|---|---|
| (0,1,0,1) | 34 | 0 | 0 | 0 | 0 | |||
| (0,0,0,0) | 23 | 0 | 0 | 0 | 0 | |||
| (0,1,0,0) | 17 | 0 | 0 | 0 | 0 | |||
| (0,0,0,1) | 26 | 1 | 0 | 0 | 0 | |||
| (0,0,1,2) | 15 | 0 | 1 | 1 | 1 | |||
| (0,0,1,0) | 22 | 1 | 1 | 1 | 1 | |||
| (0,0,1,1) | 1 | 0 | 1 | 1 | 1 | |||
| (0,0,0,2) | 29 | 1 | 1 | 1 | 0 | |||
| (1,0,0,0) | 14 | 0 | 0 | 0 | 0 | |||
| (0,2,0,1) | 33 | 1 | 1 | 0 | 0 | |||
| (0,2,0,0) | 5 | 0 | 0 | 0 | ||||
3.3 |. Sequential random subset maximisation
The feasibility of the sweep algorithm depends on the size of at each . The sweep algorithm occasionally could get stuck at particular when is large. For a problem with large and , evaluating may not be feasible in general. For these large and large situations, we propose a sequential random subset maximisation (SRSM) method to approximate defined in the sweep algorithm. First determine a subset size :
Select random subset: Randomly select a subset , where and is the set associated with undetermined on and is initially set as .
Maximise: Evaluate and set equal to on the subset .
Impose partial orders: Let be the set of indices associated with whose values are implied by by partial ordering. Update on accordingly.
Update , and repeat step 1 until .
Set when the algorithm ends.
While there is no theoretical guarantee the maximisation step (step 2) in SRSM will yield the true , we may repeat the algorithm many times and select the classifiers with maximum .
SRSM is intended to be applied in conjunction with step 2b of the sweep algorithm. However, the method can be a stand-alone algorithm for the evaluation of , by replacing with in the algorithm. Table 2 summarises the performance of the SRSM algorithm when applied to the simulated data in Section 3.2 for directly evaluating the full vector at . In this case, as we know the ground truth from the sweep algorithm, the table records how frequent the SRSM is correct for different values of . As expected, the algorithm is correct more often with a larger . In reality where we do not have the ground truth, we require the algorithm only to be correct at least once (instead of most of the time). Therefore, as long as there is a non-trivial likelihood of getting the correct on each SRSM run, the probability of getting the correct answer upon repeated runs will be very high. In our example, the likelihood is quite high (at 38%) even when a small subset is sampled out of the possible conditions.
TABLE 2.
Performance of the SRSM algorithm for evaluating in the simulated rehab data () using different . The SRSM algorithm is repeated 100 times for each . The ground truth is known based on the sweep algorithm.
| 5 | 7 | 9 | 11 | 12 | 13 | 14 | 15 | 20 | |
|---|---|---|---|---|---|---|---|---|---|
| Number of correct classification | 38 | 45 | 46 | 46 | 51 | 59 | 57 | 57 | 58 |
To illustrate how SRSM works with the sweep algorithm for a large , we consider another simulated data set that include all 7 factors in a rehabilitation chart review, i.e., a total of 648 conditions. The data generation model is described in Section 5.3 below. At in the sweep algorithm, the set consists of 25 conditions. First, to obtain the ground truth , we enumerated the entire set , identified 67,929 partially ordered classifier ensembles, and evaluated per (2). Next, we applied SRSM with out of possible conditions and repeated the algorithm 100 times. Five of the 100 repetitions yielded the true . With , the number of correct classifications increased to 14. The SRSM algorithm with and 100 repetitions took less than a minute to run on a local computer, while it took a few hours to enumerate on the same machine.
4 |. APPLICATIONS
4.1 |. Estimating population parameters under partial ordering
iPIPE is directly applicable to the estimation of population parameters associated with partially ordered conditions. Generally, let denote a random sample of size from a population with parameter and nuisance parameters under condition , where is nondecreasing in in terms of partial ordering. Then the PIPE-classifier (2) and its inverse (7) can be evaluated with with respect to the posterior marginal distribution of .
To illustrate, consider normal data with mean and variance . Unconstrained Bayesian inference about and may be performed independently for each based on a semiconjugate prior: and . The posterior distribution of and can then be simulated from
| (9) |
where is the sample mean of the random sample, the function denotes standard normal density, is the normal likelihood given , and
Suppose that ’s are nondecreasing in in terms of partial ordering. i.e., setting with nuisance parameters . Then we have
| (10) |
where is the standard normal distribution function. Instead of evaluating (10) numerically, we can compute , for each given , by drawing according to the posterior distribution (9).
In another example, consider binomial variable with size and response probability under condition . Postulating a conjugate beta prior on the probability parameter, we can draw unconstrained inference about based on the posterior distribution , i.e., , where is the distribution function of beta . To illustrate in a real data application, we consider data in 5,087 users in an app recommender study (Cheung et al., 2018). In this analysis, each user received app recommendations on a weekly basis and their usage was tracked over a four-week period. Specifically, in each of the four weeks, a user would receive app recommendations for . For illustration purposes, we consider a dichotomised response, defined as engaging an app in the system for at least three times during the week following the four-week period. The left panel of Figure 1 summarises the recommendation patterns and the app use data for each condition: In the 5,087 users, we observe a total patterns.
Figure 1.

Analysis of app recommender data in 5,087 users. In the forest plots, solid circles and squares respectively indicate posterior median of for unconstrained estimation and PIPE (i.e., ); and each line indicates a 95% credible interval. The conditions are ordered according to the posterior median based on iPIPE. The dotted vertical lines in both plots indicate the largest iPIPE median and are given as a reference to indicate variability: the unconstrained estimates apparently are more variable across conditions than iPIPE.
The unconstrained estimates are obtained with uniform prior on each (i.e., ) and are plotted in the right panel of Figure 1, along with the iPIPE estimates (medians and 95% credible intervals) using the sweep algorithm. The impact of the partial ordering assumption is clearly demonstrated as the iPIPE estimates differ from the unconstrained estimates in two ways. First, iPIPE depicts which conditions are not effective with very low estimated response rates, whereas the unconstrained estimates are more variable across conditions. Second, the 95% credible intervals based on iPIPE are much narrower than those based on unconstrained estimation, particularly when is small.
4.2 |. Regression models
This subsection describes applications of iPIPE in regression models that include covariates or confounding variables as well as the multi-factor condition as predictor variables, for different outcome types. First consider linear regression for normal data:
| (11) |
where and respectively denote the response and the condition of subject with covariates and normal noise with variance for . Unconstrained Bayesian inference can be performed with the standard non-informative prior, which is uniform on , or equivalently, (Gelman et al., 1995). However, when is large and the numbers of observations for some conditions are small, a proper prior on ’s may be used instead. Generally, it is easy to draw from the posterior distribution with respect to other prior distributions such as normal using standard software such as Rstan (Stan Development Team, 2021); see Section 5.1 for an example.
Under model (11), the effects of the conditions are expressed in terms of the intercepts ’s. If these intercepts are partially ordered according to the conditions, we can set and the response surface can be estimated by with for , where the expectation is taken with respect to the unconstrained posterior distribution of . The nuisance parameters in this application are .
In some applications, one of the conditions (say ) is a control condition and the interest is in estimating the effect due to relative to , i.e., . As such, we may impose no constraint between and other ’s; rather, partial ordering is applied to for . That is, the response surface of interest has parameters and can be estimated using iPIPE with for , where the expectation is taken with respect to the unconstrained posterior distribution of . This illustrates how iPIPE can be applied to address constraints on different contrasts in model (11) in different applications, once the unconstrained posterior of ’s is obtained.
Generalised linear models are commonly used for regression analysis of non-normal response data, including logistic and probit regression for binary data, Poisson regression for count data, gamma regression for nonnegative continuous data, as well as linear regression. The expected response is -linear in the regression coefficients, i.e.,
| (12) |
where is a known link function (McCullagh and Nelder, 1989). There is a large literature on Bayesian analysis of the generalised linear models. Ibrahim and Laud (1991), for example, give an in-depth discussion on the use of Jeffrey’s prior. Due to the large number of conditions, however, proper priors such as normal may be used for ’s; see Dellaportas and Smith (1993) who discuss an application using Gibbs sampling to compute the posterior. The response surface is to be defined in terms of ’s according to the application and estimated using iPIPE in an analogous way as in the linear model described above.
For right-censored survival data, Cox’s model assumes that the hazard function at time is given by where is the baseline hazard and the conditions and covariates have multiplicative effects on the hazards via defined analogously to (12). While the focus of inference is often on the regression coefficients , many Bayesian approaches to handle have been considered, including parametric models (Dellaportas and Smith, 1993), nonparametric prior process such as the gamma process (Burridge, 1981; Sinha and Dey, 1997), and avoiding modeling via the use of partial likelihood (Sinha et al., 2003). For example, the partial likelihood under proportional hazards is free of the nuisance and can be expressed
| (13) |
| (14) |
where is the minimum of censoring time and survival time of subject is the indicator of observing the survival time, and is the risk set at time . Note that when is unspecified, and assuming that such that , the right-hand-side of expression (13) is over-parameterised as the term can be absorbed into the baseline hazard function. Unconstrained Bayesian inference about can be based on partial likelihood (14) and the corresponding approximate posterior density , which is a limiting marginal posterior of under a fully Bayesian approach with a diffuse gamma process prior on (Sinha et al., 2003). If the response surface thus defined is assumed nondecreasing in in terms of partial ordering, it can be estimated using iPIPE based on the unconstrained posterior draws of ’s with for in the same way as in the linear and generalised linear models. Note that if, in some applications, the response surface of interest is , a parametric form of is needed to ensure identifiability; see for example Dellaportas and Smith (1993) who propose Gibbs sampling for proportional hazards model for Weibull survival time, where the full conditionals are simulated using adaptive rejection sampling (Gilks and Wild, 1992).
4.3 |. Covariate-dependent response surface
Model (11) can be extended to accommodate situations where the multi-factor condition interacts with the covariates, i.e., having
| (15) |
For the coefficients of the interaction terms, we may postulate a priori, independently of the prior specified for the main effects ’s and and the variance term for model (11). A relatively informative prior, i.e., a small , corresponds to the assumption that the effects of the conditions follow exactly the same order under all . In the special case when (i.e., a degenerative prior), the response surface will follow the same full order of ’s for each . Generally, the full order of a covariate-dependent response surface
| (16) |
varies depending on even if it is subject to the same partial ordering constraint with respect to the condition . The response surface (16) can be estimated using iPIPE, for each given , with where the expectation is taken with respect to the unconstrained posterior distribution of . Applications with the generalised linear models and Cox’s model are analogous.
4.4 |. Hierarchical models for repeated measurements
In situations where an individual has repeated observations under different conditions, one may estimate the individual response surface using hierarchical models. Let be the th measurement of individual under . An outcome model for these individuals can be expressed as
| (17) |
where the individual effects ’s of condition can be viewed as random effects that are potentially dependent with each other via underlying risk factors or confounding variables , and are possibly correlated with noise . As (17) is quite general, we illustrate in a concrete example where we examined the individual effects of sedentary breaks on cognitive performance in participants. A sedentary break condition was defined by 2 factors over an 8-hour period: break frequency and duration . In addition to a control condition with no sedentary breaks denoted as , each factor had two levels in the experiments:
low frequency (; a break every 60 minutes) vs high frequency (; a break every 30 minutes);
low break duration (; 1 minute per break) vs high duration (; 5 minutes per break).
Thus, each participant would be evaluated under conditions on 5 different days with a 4–14 days washout period between conditions; see Table 3 for the list of condition for . We were interested in estimating the effects of sedentary break relative to the control condition in terms of the change in the Symbol Digit Modalities Test (SDMT) over an 8-hour period. While it was reasonable to assume that change in SDMT increases with each factor for each individual, we did not impose constraints between the control and other condition, as we were interested in estimating for for each .
TABLE 3.
Estimation of the population-level response surface using the sedentary break data in 11 individuals: 6 men and 5 women is the number of observations available for a given condition . For the unconstrained posterior and the constrained posterior, median (‘med’) of the posterior draws of are reported along with the 0.025 and 0.975 posterior quantiles (‘95% int’). For iPIPE, the respective quantities are obtained by setting .
| (a) Dose-response analysis results for men | ||||||||
|---|---|---|---|---|---|---|---|---|
| Condition | Unconstrained | iPIPE, | Constrained posterior | |||||
| med | 95% int | med | 95% int | med | 95% int | |||
| 2 | (1,1) | 6 | 2.85 | (−2.37, 10.1) | 0.94 | (−4.53, 7.57) | −0.37 | (−5.68, 4.84) |
| 3 | (1,2) | 6 | −0.12 | (−5.33, 6.67) | 0.94 | (−4.53, 7.57) | 1.94 | (−3.26, 7.50) |
| 4 | (2,1) | 6 | −0.32 | (−5.46, 6.54) | 0.94 | (−4.53, 7.57) | 1.98 | (−3.17, 7.43) |
| 5 | (2,2) | 5 | 5.74 | (−0.17, 12.4) | 5.74 | (−0.17, 12.4) | 6.86 | (1.22, 14.1) |
| (b) Dose-response analysis results for women | ||||||||
| Condition | Unconstrained | iPIPE, | Constrained posterior | |||||
| med | 95% int | med | 95% int | med | 95% int | |||
| 2 | (1,1) | 5 | 2.40 | (−3.35, 9.73) | 1.85 | (−3.83, 9.74) | 1.04 | (−2.52, 2.15) |
| 3 | (1,2) | 4 | 5.60 | (−0.58, 12.82) | 2.87 | (−3.27, 10.7) | 2.74 | (1.10, 5.80) |
| 4 | (2,1) | 5 | 1.56 | (−4.48, 9.38) | 1.85 | (−3.83, 9.74) | 1.33 | (0.10, 4.45) |
| 5 | (2,2) | 3 | 0.44 | (−8.22, 7.01) | 2.87 | (−3.27, 10.7) | 2.77 | (2.74, 8.03) |
We thus rewrite (17) as
| (18) |
and postulate that each is nondecreasing in in terms of partial ordering. Under the parameterisation (18), indicates the mean response of participant under the control condition. For the individual-level parameters, we postulate a priori that:
;
for for each individual , where is the gender of individual .
That is, the condition effects accounts for the gender in addition to the variability among individuals in the population. Further, the population-level parameters, namely , and , has the following prior distributions:
has an improper flat prior, i.e., ;
;
;
all variance parameters follow an inverse chi-squared distribution with 1 degree of freedom.
The “layer” of the population-level parameters in the hierarchical model facilitates pooling data from across individuals, but does not take advantage of partial ordering. To estimate a monotone individual response surface , one could evaluate using iPIPE according to the PIPE-classifier ensemble (2) and its inverse (7) with , where the expectation is taken with respect to the posterior of , for .
In addition, we can make constrained inference about the population-level parameters using iPIPE. Specifically, we may define the population-level response surface
which is dependent on the covariate and is subject to the same partial ordering constraints as the individual ’s. Then, for each given , the response surface can be estimated using iPIPE with for where the expectation is defined with respect to the unconstrained posterior of .
The sedentary break data was fitted using model (18) with the above hierarchical structure using RStan with 4 chains each having 20,000 iterations after 5,000 warmup samples, and iPIPE was applied with (point estimate), 0.025, 0.975 (95% credible interval), and (number of observations for condition across individuals). As a comparison method, we also considered the constrained posterior that only included from the RStan samples that met partial ordering. Similar analyses were conducted for the population-level ’s.
The analysis results for population-level response surface are given in Table 3. When the unconstrained estimates do not violate any partial ordering constraint, the iPIPE estimates are similar to the unconstrained estimates; e.g., condition 4 in Table 3(a). When the unconstrained estimates violate partial ordering, iPIPE pools data from across conditions and equalises the estimates; e.g., in Table 3(b), iPIPE results in for conditions 1 and 3. In this regard, iPIPE behaves similarly to PAVA. In contrast, estimation based on the naive constrained posterior reverses the order of the unconstrained estimates for conditions 1 and 3. Similarly, the constrained posterior leads to significant conclusions (i.e., 95% credible interval excluding 0) for conditions 3 and 4 in Table 3(b), which could be an artifact of the constrained sampling scheme rather than the data.
Figure 2 displays the individual response surface estimates of 11 participants using the unconstrained posterior, iPIPE, and the constrained posterior. By imposing partial ordering, iPIPE (Figure 2(b)) reduces between-individual variability and produces estimates in a range similar to the unconstrained fits (Figure 2(a)). In contrast, the constrained posterior seems to lead to artificially exaggerated dose-response relationships at individual levels; e.g., see estimates for individual “f”.
Figure 2.

Estimated individual response surface in the sedentary break study, for . The estimates for participant “f” are color-coded for visibility of illustration.
5 |. SIMULATION STUDY
5.1 |. Simulation setting 1: with equal sample size
In this section, we evaluate the performance of iPIPE when compared to other methods using simulation. The first simulation study examines situations with small and , namely, with and and conditions. Each condition has equal number of independent observations, generated from normal with mean and a common standard deviation , where . Four scenarios of are considered in the simulation and are given in Table 4.
TABLE 4.
Simulation setting 1: with equal sample size
| (a) Scenario 1: | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Condition | True | Unconstrained | iPIPE, | Constrained posterior | PAVA | ||||||||
| bias | mse | cov | bias | mse | cov | bias | mse | cov | bias | mse | |||
| 1 | 1 | 3.0 | −0.07 | 0.79 | 0.96 | −0.26 | 0.60 | 0.98 | −0.90 | 1.17 | 0.77 | −0.27 | 0.60 |
| 2 | 1 | 3.4 | 0.01 | 0.82 | 0.96 | −0.01 | 0.43 | 0.99 | −0.12 | 0.30 | 0.98 | −0.01 | 0.43 |
| 3 | 1 | 3.7 | 0.04 | 0.84 | 0.96 | 0.23 | 0.60 | 0.98 | 0.67 | 0.81 | 0.86 | 0.23 | 0.59 |
| 1 | 2 | 5.0 | −0.10 | 0.75 | 0.95 | −0.27 | 0.53 | 0.98 | −0.69 | 0.75 | 0.87 | −0.27 | 0.54 |
| 2 | 2 | 5.4 | −0.02 | 0.81 | 0.96 | −0.04 | 0.41 | 0.99 | 0.07 | 0.24 | 0.98 | −0.05 | 0.41 |
| 3 | 2 | 5.7 | 0.01 | 0.83 | 0.95 | 0.24 | 0.60 | 0.98 | 0.93 | 1.19 | 0.79 | 0.23 | 0.59 |
| (b) Scenario 2: | |||||||||||||
| Condition | True | Unconstrained | iPIPE, | Constrained posterior | PAVA | ||||||||
| bias | mse | cov | bias | mse | cov | bias | mse | cov | bias | mse | |||
| 1 | 1 | 3.0 | −0.05 | 0.80 | 0.95 | −0.29 | 0.57 | 0.97 | −1.00 | 1.30 | 0.76 | −0.29 | 0.57 |
| 2 | 1 | 3.2 | −0.01 | 0.77 | 0.96 | −0.01 | 0.36 | 0.99 | −0.11 | 0.20 | 0.99 | −0.01 | 0.36 |
| 3 | 1 | 3.5 | 0.03 | 0.77 | 0.96 | 0.26 | 0.52 | 0.99 | 0.67 | 0.81 | 0.85 | 0.26 | 0.51 |
| 1 | 2 | 5.0 | −0.05 | 0.82 | 0.95 | −0.33 | 0.61 | 0.97 | −0.93 | 1.11 | 0.77 | −0.34 | 0.61 |
| 2 | 2 | 5.1 | −0.03 | 0.79 | 0.96 | −0.01 | 0.34 | 0.99 | 0.07 | 0.20 | 0.99 | −0.01 | 0.34 |
| 3 | 2 | 5.2 | 0.01 | 0.78 | 0.97 | 0.31 | 0.54 | 0.98 | 1.06 | 1.39 | 0.66 | 0.30 | 0.54 |
| (c) Scenario 3: | |||||||||||||
| Condition | True | Unconstrained | iPIPE, | Constrained posterior | PAVA | ||||||||
| bias | mse | cov | bias | mse | cov | bias | mse | cov | bias | mse | |||
| 1 | 1 | 2.0 | −0.10 | 0.81 | 0.95 | −0.36 | 0.62 | 0.97 | −1.08 | 1.46 | 0.71 | −0.37 | 0.62 |
| 2 | 1 | 2.1 | −0.02 | 0.85 | 0.95 | −0.05 | 0.36 | 0.99 | −0.12 | 0.24 | 0.98 | −0.05 | 0.36 |
| 3 | 1 | 2.2 | 0.00 | 0.87 | 0.95 | 0.31 | 0.59 | 0.98 | 0.87 | 1.09 | 0.81 | 0.31 | 0.59 |
| 1 | 2 | 4.0 | −0.05 | 0.78 | 0.97 | −0.27 | 0.59 | 0.98 | −0.74 | 0.93 | 0.85 | −0.27 | 0.59 |
| 2 | 2 | 4.3 | −0.01 | 0.76 | 0.96 | −0.02 | 0.36 | 0.99 | 0.10 | 0.28 | 0.99 | −0.02 | 0.36 |
| 3 | 2 | 4.4 | 0.06 | 0.75 | 0.96 | 0.30 | 0.55 | 0.97 | 1.03 | 1.35 | 0.71 | 0.30 | 0.54 |
| (d) Scenario 4: | |||||||||||||
| Condition | True | Unconstrained | iPIPE, | Constrained posterior | PAVA | ||||||||
| bias | mse | cov | bias | mse | cov | bias | mse | cov | bias | mse | |||
| 1 | 1 | 1.0 | 0.02 | 0.78 | 0.95 | −0.30 | 0.56 | 0.97 | −1.13 | 1.58 | 0.65 | −0.30 | 0.56 |
| 2 | 1 | 1.8 | −0.06 | 0.82 | 0.96 | −0.21 | 0.42 | 0.99 | −0.58 | 0.50 | 0.95 | −0.22 | 0.42 |
| 3 | 1 | 2.3 | 0.05 | 0.80 | 0.96 | 0.05 | 0.41 | 0.99 | 0.19 | 0.24 | 0.98 | 0.05 | 0.42 |
| 1 | 2 | 1.0 | −0.01 | 0.74 | 0.96 | 0.19 | 0.44 | 0.99 | 0.14 | 0.28 | 0.99 | 0.18 | 0.44 |
| 2 | 2 | 2.1 | 0.03 | 0.84 | 0.95 | 0.11 | 0.43 | 0.99 | 0.37 | 0.33 | 0.95 | 0.11 | 0.43 |
| 3 | 2 | 2.8 | −0.07 | 0.66 | 0.98 | 0.16 | 0.47 | 0.99 | 0.99 | 1.22 | 0.71 | 0.16 | 0.47 |
mse, mean squared error; cov, coverage probability.
Unconstrained estimates of are obtained by fitting a Bayesian linear model with and following an inverse chi-squared distribution with 1 degree of freedom. We ran 4 chains in RStan each having 1,250 iterations after discarding 1,000 warmup samples (i.e., having 5,000 samples from the unconstrained posterior). The iPIPE point estimates are then obtained with and interval estimates with and 0.975. We also evaluate the constrained posterior estimates that include only posterior draws (out of 5,000 total) that meet partial ordering. In addition, we consider the frequentist PAVA as a comparison method.
The estimation properties of these methods are compared in Table 4. Overall, iPIPE consistently yields smaller mean squared error (mse) than the unconstrained estimates; the efficiency gain in terms of mse is quite substantial and results in reduction greater than 50% under some scenarios. The bias of iPIPE is small relative to mse. In contrast, the constrained posterior median can yield large bias, which in turn result in a large mse. Additionally, even though it is feasible to evaluate the constrained posterior estimates in this simulation because of low dimension, the proportions of draws that meet partial ordering are low, with mean ranging from 4% to 6% in the scenarios considered.
Finally, it is noteworthy that the estimation properties (bias and mse) of iPIPE are very similar to PAVA, while the former naturally produces interval estimation for inference. The simulation shows that 95% credible intervals of iPIPE achieve nominal (if conservative) coverage probability, whereas biases of the constrained posterior estimates lead to lower-than-nominal coverage probability.
5.2 |. Simulation setting 2: with uneven sample sizes
The second simulation study examines iPIPE when with and conditions with unequal sample sizes . This represents situations with sparse sampling of some conditions. A binomial outcome was generated with size and probability for condition in each simulation replicate. Unconstrained estimates of are obtained as the median of the beta posterior assuming a uniform prior. Correspondingly, iPIPE estimates are obtained with and . In this case, it is infeasible to evaluate constrained posterior by using only draws that meet partial ordering, because the partial ordering constraints are restrictive and that acceptance rate is extremely small. Similarly, it is not straightforward to implement PAVA in this setting.
Figure 3 plots the distributions of the unconstrained and iPIPE point estimates under a given based on 1,000 simulation replicates. Overall, the iPIPE estimates exhibit smaller variability than the unconstrained estimates, without inducing noticeable bias. Reduction in variability is pronounced when is small; for example, and = (1, 1, 1, 2) when . Additional simulation scenarios (given in web-based supporting material) confirm similar observations.
Figure 3.

Simulation setting 2: with unequal sample sizes and binomial outcomes. The horizontal axis of the box plots indicates the difference between true and the respective point estimates for each .
In addition to point estimates, 95% credible intervals are obtained for the unconstrained method and iPIPE. The coverage probabilities for all 16 conditions range 0.96 to 0.99 for iPIPE, and 0.94 to 0.98 for unconstrained. While the iPIPE estimates appear to be more conservative, it has narrower intervals on average: average width over all 16 conditions is 0.25 for iPIPE, compared to 0.30 for the unconstrained method.
5.3 |. Simulation setting 3: with uneven sample sizes
In this subsection, we conduct simulation for settings with and with binomial outcomes as described in Section 1. In the simulated data, for condition , we first generate where ’s are independent uniform(0,1) and keep fixed across simulation replicates. In each replicate, we draw where
| (19) |
We analyse each simulated data set using iPIPE implemented by the sweep/SRSM algorithm with and 100 repetitions, as well as unconstrained posterior estimate, with a priori. Figure 4 shows the results based on 100 simulated data sets. Both methods have similar magnitude of bias. However, the unconstrained median demonstrates a clear trend of positive bias when the true is small and negative bias when the true is large, when is small so that the uniform prior has large influence. In contrast, biases of iPIPE are much attenuated for conditions with small . This translates into smaller mse by iPIPE, noticeably when is small.
Figure 4.

Simulation setting 3: with unequal sample sizes and binomial outcomes. Each circle in the figures represent a condition and the size of the circle is proportional to .
The respective average coverage probabilities for the unconstrained method and iPIPE are 0.95 and 0.99. While iPIPE appears to be conservative, it achieves the nominal level with a higher precision (average interval width 0.33) than the unconstrained method (average interval width 0.38).
5.4 |. Simulation setting 4: The effect of dimension
Finally, we consider settings with conditions defined by binary factors, i.e., for each , where , with total sample size or 10000. An objective of this simulation study is to examine the impact of on the performance of iPIPE. There are possible conditions for a given . We first generate ’s from a multinomial distribution with size and probability for each ; and keep ’s fixed across simulation replicates. In each replicate, we generate a random sample of size under condition from an exponential distribution with rate , i.e., for and , where
| (20) |
For the unconstrained Bayesian inference, we postulate ’s to be exchangeable Gamma variables with shape 0.1 and scale 10 a priori, so the posterior is Gamma with shape and scale . The iPIPE point estimates are then obtained with and interval estimates with , and 0.975 using the sweep/SRSM algorithm with and 50 repetitions. iPIPE and the unconstrained Bayesian inference are evaluated based on 50 simulation replicates.
Both methods have similar magnitude of bias, which seems to remain in range as increases (Figure 5). In contrast, iPIPE has much smaller mse than unconstrained inference especially as increases (Figure 6). We note that these are simulation scenarios with sparse observations. For example, when and , there are conditions with at least one observation, i.e., about 1.7 observations per condition. Hence, iPIPE improves upon the unconstrained inference by reducing variability, while keeping bias similar.
Figure 5.

Simulation setting 4: Bias vs true for and . Each circle in the figures represent a condition and the size of the circle is proportional to .
Figure 6.

Simulation setting 4: Root mean squared error vs true for and . Each circle in the figures represent a condition and the size of the circle is proportional to .
Table 5 shows the aggregate performance of the two methods in terms of bias, mse, coverage probability, and mean width of 95% credible intervals averaged across all ’s for each pair . The average biases are small relative to mse and remain range bound. As expected, mse increases as increases, as the situations reflects increasingly sparse observations per condition. However, the performance of iPIPE relative to unconstrained inference improves quite substantially as increases, indicating the information contained in the partial ordering assumption. Additionally, by doubling from 5000 to 10000, the average mse is reduced by 40% to 50% for all considered. For iPIPE, we also examine the estimation properties for the PIPE-classifier ensemble by in terms of average classification error (ACE) across all conditions, defined as
| (21) |
TABLE 5.
Simulation setting 4: with sparse sampling
| Unconstrained | iPIPE, | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| bias | mse | cov | width | bias | mse | cov | width | ||||
| 10 | 5000 | 1014 | 0.18 | 1.10 | 0.96 | 3.2 | 0.067 | 0.13 | 0.99 | 2.40 | 0.029 |
| 10000 | 1024 | 0.095 | 0.45 | 0.95 | 1.93 | 0.087 | 0.092 | 0.99 | 1.72 | 0.026 | |
| 11 | 5000 | 1876 | 0.31 | 1.94 | 0.96 | 5.03 | −0.021 | 0.18 | 0.99 | 3.16 | 0.039 |
| 10000 | 2034 | 0.20 | 1.15 | 0.96 | 3.15 | 0.034 | 0.12 | 0.99 | 2.35 | 0.029 | |
| 12 | 5000 | 2886 | 0.38 | 2.42 | 0.97 | 6.50 | −0.14 | 0.26 | 0.99 | 3.75 | 0.056 |
| 10000 | 3746 | 0.31 | 1.96 | 0.96 | 5.02 | −0.081 | 0.17 | 0.99 | 3.03 | 0.042 | |
indicates the number of conditions with .
bias, average bias; mse, average mean squared error; cov, average coverage probability; width, average mean width of 95% credible intervals; , average of ACE (21) over 50 simulation replicates.
Table 5 reports the average calculated by averaging ACE across all simulation replicates. correlates with the average mse for estimating : it increases as increases and decreases, but is low in all the scenarios we considered. One can improve and estimation accuracy by increasing and the number of repetitions in the SRSM algorithm. For example, we ran simulation using 100 repetitions (instead of 50) in SRSM for the scenario and and obtained (vs. 0.056) and an average mse of 0.24 (vs. 0.26). While the improvement is minimal because 50 repetitions seem to be adequate, it suggests is a good proxy for the adequacy of the SRSM algorithm. An advantage of using over the use of mean squared errors is that takes values on [0, 1], thus providing an index that facilitates benchmarking and interpretation of the accuracy of the method.
Finally and importantly, credible intervals of both methods achieve the nominal level, although iPIPE seems to be conservative and at the same time reduces the average width of the intervals. This indicates iPIPE retains its accuracy in quantifying uncertainty even under these sparse settings.
6 |. DISCUSSION
In this article, we deal with the estimation of monotone response surface defined on multiple factors and observed on a large number of distinct conditions. We make two main contributions. First, we have proposed an estimation method, called iPIPE, by inverting a partially ordered classifier ensemble (PIPE-classifiers). While the PIPE-classifiers are motivated with a decision theoretic framework with a classification-type gain function, they may be viewed as a projection of Bayes classifiers on the constrained space of partial ordering. iPIPE is nonparametric in that the method does not rely on any assumptions (e.g. additivity and smoothness) other than monotonicity. In our data examples and simulation, we have demonstrated that point estimation based on iPIPE behaves similarly to PAVA. The Bayesian decision-theoretic framework facilitates interval estimation and we have demonstrated that iPIPE-based 95% credible intervals achieve the frequentist coverage probability, and in fact, are conservative in our simulation scenarios. Such conservativeness interestingly comes with higher precision (shorter widths) when compared to the unconstrained Bayesian inference, and will warrant further investigation. Additionally, simulation results consistently show iPIPE has smaller mse than unconstrained estimation; and efficiency gain is particularly substantial when sampling of conditions is sparse. Also, iPIPE is versatile. It can be applied with many common statistical models described in Section 4, and it is potentially applicable to advanced semi-parametric models such as in spatiotemporal modeling; e.g., in Gaussian processes for spatial data (e.g., Banerjee et al. (2008); Datta et al. (2016)), the estimation of the cross-covariance function may be improved using iPIPE as the covariance between two locations is conceivably non-increasing in the distance between the locations. This represents an interesting and important line of future research.
Second, we have proposed algorithms that render iPIPE computationally feasible for estimating moderate-to-high dimension response surface, while the existing literature of estimating multivariate monotone functions has discussed little on situations with . Specifically, we have proposed a sweep algorithm and have proved that it gives the true iPIPE defined in (7). At first glance, estimation by inverting a classification problem is more computationally intensive than the classification problem itself, because it involves iterating a threshold on a fine grid and it has to solve the classifiers for each . The sweep algorithm is interesting in that it takes advantage of the iteration step together with a sweep step (step 2c) to reduce the optimisation problem in classification (PIPE) into a more manageable subset maximisation step (step 2b). That is, the sweep algorithm integrates the two problems: estimation (iPIPE) can be a means to evaluating the classifiers (PIPE), while the former is in principle constructed by evaluating the latter at all thresholds. We have also proposed a sequential random subset maximisation (SRSM) algorithm to supplement the sweep algorithm. The idea of SRSM is to further reduce the subset maximisation step in sweep (step 2b) into even smaller computation tasks over sequentially selected random subsets. While there is no theoretical guarantee of giving the true maximisers, the SRSM method identifies the true maximisers in our data illustrations; and its likelihood of success can be enhanced by running the algorithms many times. We have applied the sweep/SRSM algorithm to analyse simulated data with factors and conditions. We note that the computational costs of SRSM grow linearly in , as opposed to . In addition, while SRSM performs computation tasks over subsets of conditions sequentially, a possible alternative is to perform maximisation of each subset in parallel, and then pool and harmonise the results with respect to the constraint. Subset maximisation thus naturally lends itself to a divide-and-conquer approach (Guhaniyogi and Banerjee, 2018; Jordan et al., 2019), which can be implemented in parallel on multi-core machines or high-performance computing clusters. As such, the method can be scaled to address massive problems due to large dimension by leveraging the underlying computational architecture.
Supplementary Material
acknowledgements
This work was supported by NIH grants R01HL153642, R01MH109496, and UL1TR001873. This work was also supported by the Robert N. Butler Columbia Aging Center of Columbia University.
Appendix
| Appendix A: Proof of Lemma 1
For brevity, we will omit the threshold in the proof of Lemma 1. Restating Lemma 1, we aim to prove for all for a given .
First, since and , we have by definition.
Next, partition the set into , and . Because , we have for by monotonicity; and since , we have on . Similarly, we observe that for because , and hence on . Further split into two sets: and . By the definition of , we have for . The proof of Lemma 1 will be completed by proving:
Claim 1. for .
Recall that denotes the subvector on , and suppose . Construct a classifier ensemble as follows:
| (22) |
Claim 2. . Hence, because .
Proof of Claim 2: Since , we have . Thus, . Also by (22), we have . That is, partial ordering of holds within and . To prove Claim 2, it remains to show that partial ordering of holds for every pair and :
First, consider the case where ; and recall that for . Since , partial ordering will hold between and implying that . Because can take on any value while partial ordering will hold between and .
Second, consider the case where . Note that : because would imply , which would in turn put by definition of . As a result, because can take on any value while partial ordering will hold between and .
This completes the proof of Claim 2.
Next, write where
| (23) |
i.e., dependence on is omitted for brevity. Then we have
| (24) |
based on the definitions of , and . Similarly, we can write
| (25) |
where . Because maximises on by definition (8) and per Claim 2, we have . Applying this with (24) and (25), we obtain the inequality
| (26) |
for any .
Finally, suppose that for some where maximises on per (8). Construct an ensemble as follows: define
| (27) |
Using similar arguments used in the proof of Claim 2 above, we can show that . Further since , the subvector . Then we obtain
| (28) |
The inequality in (28) is a result of (26) and the fact that . However, the inequality (28) contradicts the definition of as the maximiser of on , except when for all . Thus, by contradiction, for any . This completes the proof of Claim 1 and Lemma 1.
| Appendix B: Proofs of Theorem 1, Proposition 3, and Proposition 4
Proof of Theorem 1: First consider a fixed as in Lemma 1. The maximiser of over will be either or , the respective maximisers over and . Specifically,
| (29) |
| (30) |
Equation (29) holds because and expanding on (30) gives
| (31) |
where and are defined in (23). Dividing both sides of (31) by its right-hand side further gives
| (32) |
The first three terms in (32) are increasing functions in , because it can be easily verified that is increasing and is decreasing in . In addition, Lemma 1 implies that and therefore the fourth term in (32) is also increasing . Therefore, for any , the inequality (32) implies
| (33) |
which is equivalent to using the same logic as (29) and (30). We have thus showed for any given and completed the proof of Theorem 1.
Proof of Proposition 3: Let denote the posterior cdf of and assume that it is continuous, so that . The estimator will thus solve .
Proof of Proposition 4: First, we note that maximises over , by applying Proposition 2 with the fact that for (hence ). Next, we note that for any pair and . Thus, the ensemble formed by putting and together will belong to ; and since they maximise and respectively, the ensemble thus formed maximises .
Footnotes
conflict of interest
The authors have no conflicts of interest to disclose.
data availability statement
The data underlying this article will be shared on reasonable request to the corresponding author.
references
- Ayer M, Brunk HD, Ewing GM, Reid WT and Silverman E (1955) An empirical distribution function for sampling with incomplete information. The Annals of Mathematical Statistics, 26, 641–647. [Google Scholar]
- Bacchetti P (1989) Additive isotonic regression. Journal of the American Statistical Association, 84, 289–294. [Google Scholar]
- Banerjee S, Gelfand AE, Finley AO and Sang H (2008) Gaussian predictive process models for large spatial data sets. J. R. Statist. Soc. B, 70, 825–848. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Barlow RE, Bartholowmew DJ, Bremmer JM and Brunk HD (1972) Statistical Inference Under Order Restrictions. John Wiley & Sons, New York. [Google Scholar]
- Bornkamp B, Ickstadt K and Dunson D (2010) Stochastically ordered multiple regression. Biostatistics, 11, 419–431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brunk HD (1955) Maximum likelihood estimates of monotone parameters. The Annals of Mathematical Statistics, 26, 607–616. [Google Scholar]
- Burridge J (1981) Empirical Bayes analysis of survival time data. J. R. Statist. Soc. B, 43, 65–75. [Google Scholar]
- Cheung K, Ling W, Karr CJ, Weingardt K, Schueller SM and Mohr DC (2018) Evaluation of a recommender app for apps for the treatment of depression and anxiety: an analysis of longitudinal user engagement. Journal of the American Medical Informatics Association, 25, 955–962. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cheung Y, Chandereng T and Diaz KM (2022) A novel framework to estimate multidimensional minimum effective doses using asymmetric posterior gain and e-tapering. The Annals of Applied Statistics, 16, 1445–1458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chung Y, Ivanova A, Hudgens M and Fine J (2018) Partial likelihood estimation of isotonic proportional hazards models. Biometrika, 105, 133–148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Datta A, Banerjee S, Finley AO and Gelfand AE (2016) Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, 111, 800–812. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dellaportas P and Smith A (1993) Bayesian inference for generalized linear and proportional hazards models via Gibbs sampling. Appl. Statist, 42, 443–459. [Google Scholar]
- Fu A, Narasimhan B and Boyd S (2020) CVXR: An R package for disciplined convex optimization. Journal of Statistical Software, 94, 1–34. [Google Scholar]
- Gelman A, Carlin J, Stern H and Rubin D (1995) Bayesian Data Analysis. Chapman & Hall. [Google Scholar]
- Gilks W and Wild P (1992) Adaptive rejection sampling for gibbs sampling. Appl. Statist, 41, 337–348. [Google Scholar]
- Guhaniyogi R and Banerjee S (2018) Meta-kriging: scalable Bayesian modeling and inference for massive spatial datasets. Technometrics, 60, 430–444. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Holmes CC and Heard NA (2003) Generalized monotonic regression using random change points. Statistics in Medicine, 22, 623–638. [DOI] [PubMed] [Google Scholar]
- Ibrahim J and Laud P (1991) On Bayesian analysis of generalized linear models using Jeffrey’s prior. Journal of the American Statistical Association, 86, 981–986. [Google Scholar]
- Jordan MI, Lee JD and Yang Y (2019) Communication-efficient distributed statistical inference. Journal of the American Statistical Association, 114, 668–681. [Google Scholar]
- Leitenstorfer F and Tutz G (2007) Generalized monotonic regression based on B-splines with an application to air pollution data. Biostatistics, 8, 654–673. [DOI] [PubMed] [Google Scholar]
- Lin L and Dunson DB (2014) Bayesian monotone regression using Gaussian process projection. Biometrika, 101, 303–317. [Google Scholar]
- Mander A and Sweeting M (2015) A product of independent beta probabilities dose escalation design for dual-agent phase I trials. Statistics in Medicine, 34, 1261–1276. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McCullagh P and Nelder J (1989) Generalized Linear Models. Chapman & Hall/CRC, second edn. [Google Scholar]
- Morton-Jones T, Diggle P, Parker L, Dickinson HO and Binks K (2000) Additive isotonic regression models in epidemiology. Statistics in Medicine, 19, 849–859. [DOI] [PubMed] [Google Scholar]
- Ramsay JO (1988) Monotone regression splines in action. Statistical Science, 3, 425–441. [Google Scholar]
- Robertson T, Wright FT and Dykstra RL (1988) Order Restricted Statistical Inference. John Wiley & Sons, New York. [Google Scholar]
- Sinha D and Dey DK (1997) Semiparametric Bayesian analysis of survival data. Journal of the American Statistical Association, 92, 1195–1212. [Google Scholar]
- Sinha D, Ibrahim J and Chen M (2003) A Bayesian justification of Cox’s partial likelihood. Biometrika, 90, 629–641. [Google Scholar]
- Stan Development Team (2021) RStan: the R interface to Stan. R package version 2.21.3, https://mc-stan.org/.
- Stein J, Rodstein BM, Levine SR, Cheung K, Sicklick A, Silver B, Hedeman R Egan A, Borg-Jensen P and Magdon-Ismail Z (2022) Which road to recovery?: Factors influencing postacute stroke discharge destinations: A Delphi study. Stroke, 53, 947–955. [DOI] [PubMed] [Google Scholar]
- Wang Y and Taylor J (2004) Monotone constrained tensor-product B-spline with application to screening studies. The University of Michigan Department of Biostatistics Working Paper Series, 1022, Berkeley Electronic Press. [Google Scholar]
- Wright FT (1982) Monotone regression estimates for grouped observations. The Annals of Statistics, 10, 278–286. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data underlying this article will be shared on reasonable request to the corresponding author.
