Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Apr 1.
Published in final edited form as: J R Stat Soc Series B Stat Methodol. 2023 Mar 22;85(2):497–522. doi: 10.1093/jrsssb/qkad014

Monotone response surface of multi-factor condition: estimation and Bayes classifiers

Ying Kuen Cheung 1,*, Keith M Diaz 2
PMCID: PMC10919322  NIHMSID: NIHMS1970282  PMID: 38464683

Abstract

We formulate the estimation of monotone response surface of multiple factors as the inverse of an iteration of partially ordered classifier ensembles. Each ensemble (called PIPE-classifiers) is a projection of Bayes classifiers on the constrained space. We prove the inverse of PIPE-classifiers (iPIPE) exists, and propose algorithms to efficiently compute iPIPE by reducing the space over which optimisation is conducted. The methods are applied in analysis and simulation settings where the surface dimension is higher than what the isotonic regression literature typically considers. Simulation shows iPIPE-based credible intervals achieve nominal coverage probability and are more precise compared to unconstrained estimation.

Keywords: Clinical decision support tool, partial ordering, posterior quantiles, sweep algorithm, weighted posterior gain

1 |. INTRODUCTION

In clinical studies and health systems, interventions and patient conditions are often defined by multiple factors. To assess the total effect of an intervention or a condition, we can estimate the response surface as a multivariate function of the individual factors. In this article, we focus on monotone response surface, which is a reasonable assumption in many applications such as dose-response studies and clinical decision making. Specifically, consider a study with K multi-factor conditions. Let xk=xk1,xk2,,xkD denote the kth condition, for k{1, 2,,K}, where xkd indicates the state of the dth factor, and θk=θxk denote the parameter of interest associated the condition. We are concerned with the estimation of monotone response surface θ=θ1,θ2,,θK, where the number K is large in many applications. We assume without loss of generality that θ is nondecreasing in x in terms of the partial Euclidean ordering (≻): if xkxk, then θkθk, where xkxk denotes the event xkdxkd for each component d with at least one strict inequality.

To motivate our work, consider a recent Delphi study where an expert panel identified important factors that influence the selection of postacute care for stroke patients (Stein et al., 2022), including four main factors: likelihood of benefitting from active rehabilitation (factor 1), need for clinicians with specialised rehabilitation skills (factor 2), need for ongoing medical and nursing care (factor 3), and patient’s ability to tolerate rehabilitation (factor 4); and three minor factors: family/caregiver support (factor 5), likelihood of return to community (factor 6), and ability to return to physical home (factor 7). While the presence of each of these factors increases the likelihood of referral to an inpatient rehabilitation program, we currently plan to conduct chart reviews to understand the combined effects of these factors and develop a clinical decision support tool. In each patient chart, a main factor will be scored as 0, 1, and 2 respectively for the answer “no”, “uncertain”, and “yes”, and a minor factor will be scored as 0 for “no/uncertain” and 1 for “yes”. The outcome will be noted as whether the patient was referred to rehabilitation. In summary, each condition consists of D=7 factors with each taking on two or three possible values and there are a total of K=34×23=648 conditions. Unconstrained estimate of each θk, the underlying referral probability associated with condition k, can be obtained based on the number mk of patients under condition k and the number yk of patients referred to rehabilitation among them, for k=1,,648. In the simple cases of estimating population parameters associated with different conditions, we will use yk, to generically denote the data associated with condition k with distribution fykθk,ϑk where the nuisance parameter ϑk may be shared across conditions. Notations for increasingly complicated model set up will be defined and explained for specific applications in Section 4.

There is a large literature on monotone or isotonic regression, which can be formulated as a restricted least squares optimisation problem and can be solved using the pool-adjacent-violators-algorithm (PAVA); see Brunk (1955), Ayer et al. (1955), Barlow et al. (1972), and Robertson et al. (1988). Numerous approaches have been proposed to deal with multivariate isotonic regression, including additive models (Bacchetti, 1989; Morton-Jones et al., 2000), spline methods using monotone basis functions (Ramsay, 1988; Leitenstorfer and Tutz, 2007), Bayesian mixture modeling (Bornkamp et al., 2010), and projective Gaussian process (Lin and Dunson, 2014). These methods deal with situations with continuous functions, where additional assumptions such as piecewise linearity, additivity, and smoothness are used to make computations tractable. A monotone response surface defined over a continuous support may also be estimated using tensor-product splines with monotone basis functions on the margins and appropriate constraints on the coefficients of the basis functions; see Wang and Taylor (2004) for an application that uses B-spline bases. As such, computations can be formulated as a convex optimisation problem for which statistical packages such as the R package CVXR (Fu et al., 2020) can implement quite efficiently. Isotonic regression has also been recently studied for survival analysis. Chung et al. (2018) propose a pseudo-iterative convex minorant algorithm, with theoretical justifications, to implement PAVA and maximise the partial likelihood under isotonic proportional hazards models for right-censored data. While their approach exhibits computational stability under a piecewise constant assumption, the algorithm is applied to a single continuous covariate and focuses on point estimation. Generally, optimisation and inference of isotonic regression can become complex and challenging as the dimension D increases, while most methods have been demonstrated in problems with relatively low dimension (D=2 to 4).

Our work is motivated by a number of considerations that render the above mentioned approaches not directly applicable. First, we focus on applications where the response surface is observed on discrete levels per factor with a moderate-to-high number D of factors. While Wright (1982) studies maximum likelihood estimation of a univariate function observed on discrete levels, there is relatively little discussion on isotonic regression for multiple discrete factors. Second, we adopt a Bayesian decision-theoretic framework to deal with inference including interval estimation. For multivariate isotonic functions of discrete factors, a pragmatic Bayesian approach will first draw from the posterior distribution based on an unconstrained model and then include only draws that meet partial ordering; see Holmes and Heard (2003) for example. As will be illustrated, the constrained posterior thus obtained may cause bias and its feasibility is limited as D increases. Third, to ensure broad applicability, we aim to develop a general approach that can work with different statistical and regression models, including generalised linear models, mixed effects models, Bayesian hierarchical models, and survival models.

To address the above considerations, in this article, we propose estimation of the response surface θ by inverting an iterated sequence of partially ordered classifier ensembles, each of which is solved by projecting unconstrained Bayes classifiers onto the space constrained by partial ordering. The proposed classifier ensemble may be viewed as an extension of the product-of-independent-probability-escalation (PIPE) method by Mander and Sweeting (2015) for two-dimensional dose-response, and will thus be called PIPE-classifiers; and the estimator for θ obtained by its inverse will be called iPIPE. Cheung et al. (2022) develop a decision-theoretic framework to motivate PIPE and outline the principle of extension to dimension D>2 without examining its computational feasibility. In this article, we will propose efficient computation algorithms that solve PIPE-classifiers and iPIPE simultaneously and demonstrate its feasibility for high dimension problems.

2 |. METHODS

2.1 |. A partially ordered classifier ensemble

We first introduce a classification problem of condition k with respect to some threshold t. Define the classifier ensemble γ(t)=γ1(t),γ2(t),,γK(t) where γk(t)=Iθk>t and I() is an indicator function. As a nondecreasing function in θk, γ(t) is also nondecreasing in x in terms of partial ordering. We let Γ denote the constrained space where γ(t) lives. More generally, define

Γ𝒮=g{0, 1}𝒮:gkgk for xkxk;k,k𝒮 for some 𝒮1,,K. (1)

Then γ(t)Γ({1,,K})Γ. Further let γ𝒮(t)=γk(t):k𝒮 denote a subvector of γ(t) on 𝒮 and it is easy to see that γ(t)Γ implies γ𝒮(t)Γ(𝒮) for any 𝒮.

We consider estimation by maximising the objective function H𝒮(g;t) defined as follows:

γˆ𝒮(t)=argmaxgΓ(𝒮)H𝒮(g;t)argmaxgΓ(𝒮)k𝒮[{ϵpk(t)}gk{(1ϵ)(1pk(t))}1gk]wk (2)

where pk(t)=Eγk(t)y1,,yK is the expectation of γk(t) taken with respect to the posterior distribution of θ (to be elaborated in Section 4), and the weight wk0 is chosen to reflect the information content about condition k and some ϵ(0, 1). For brevity, we suppress the dependence of H𝒮(g;t) on ϵ in our notation. The subscript 𝒮 will also be omitted when 𝒮={1,,K}, e.g., writing γ{1,,K}(t) as γ(t), H{1,,K} as H, etc.

Proposition 1. H𝒮(g;t) is a weighted product of posterior gains over k𝒮 with respect to the gain function

hϵg;γk(t)=ϵgγkt+1-ϵ1-g1-γkt, for g=0, 1. (3)

Proposition 1 can be easily verified by taking expectations of the right-hand side of (3) with respect to the posterior distribution of θ for each k. Under this framework, ϵ may be viewed as a decision parameter that defines the relative gains of a true negative and a true position decision (Cheung et al., 2022), and may be used to control classification errors.

When we estimate an individual γk(t) we can also show γˆkB(t)=Ipk(t)>1-ϵ is a Bayes estimator for γk(t) in that it maximises the gain function (3).

Proposition 2. If γˆB(t)=γˆ1B(t),,γˆKB(t)Γ, then H(g;t) is maximised at g=γˆB(t).

Proposition 2 can be verified by observing that γˆkB(t) maximises the kth factor in the product in (2) so that

HγˆB(t);tHg;t for all g{0, 1}K. (4)

Combining (2) and (4) and setting 𝒮={1,,K}, we can write

γˆ(t)=arg mingΓHγˆB(t);t-H(g;t). (5)

As such, the classifier ensemble γˆ(t) may be viewed as a projection of the Bayes classifier ensemble on the constrained space Γ. Proposition 2 implies that if γˆkB(t) is evaluated under a joint posterior of θ with support that satisfies partial ordering, the estimator γˆ(t) will be its own projection. To ensure γˆB(t)Γ, one could use parametric models such as linear additive models to impose monotonicity. Alternatively, motivated by ease and scalability of independently computing unconstrained pk(t) and keeping model assumptions to a minimum, we propose applying (2) in conjunction with unconstrained distribution of θ. The estimator γˆ(t) thus resulted will be called a PIPE-classifier, as coined in Mander and Sweeting (2015) who introduce the special case of (2) with ϵ=0.5 and wk1 for binomial outcomes over two-dimensional grid.

2.2 |. Inverting PIPE-classifiers (iPIPE)

In this subsection, we return to the estimation of θ. Viewing the estimand θk as an inverse of γk(t), we may write

θk=γk-10=mint:γkt=0 for all tt=mint:γkt=0. (6)

The last equality in (6) holds because γk(t) is nonincreasing in t, and as such, its inverse is unambiguously defined. If a PIPE-classifier γˆk(t) is nonincreasing in t, an estimator for θk can be analogously defined as its inverse:

θˆk=γˆk-10=mint:γˆkt=0 for all tt=mint:γˆkt=0. (7)

Lemma 1. Partition Γ into Γi0=gΓ:gi=0 and Γi1=gΓ:gi=1 for a given i{1,,K}. Define

γˆilt=arg maxgΓilHg;t for I=0, 1. (8)

Then γˆi1(t)γˆi0(t). That is, γˆi1k(t)γˆi0k(t) for all k{1,,K}, where γˆilk(t) is the kth element of γˆil(t).

Theorem 1. Let γˆk(t) denote the kth element of γˆ(t) defined in (2). If γˆk(t)=0, then γˆkt=0 for t>t.

Theorem 1 shows that γˆk(t) is nonincreasing in t, and as a result, the inverse of a PIPE-classifier (7) is well-defined. In addition, the following result provides the basis of choosing ϵ for point and interval estimation.

Proposition 3. If γˆB(t)Γ for all t, then θˆk is equal to the ϵ-quantile of the posterior distribution of θk.

Generally, we do not expect γˆB(t)Γ unless in special cases such as when using parametric models as discussed after Proposition 2. Proposition 3 however provides an interpretation of ϵ in the context of estimation. Specifically, we may consider θˆk with ϵ=0.5, i.e., posterior median, as a point estimate for θk, and obtain a 95% credible interval by evaluating θˆk with ϵ=0.025, 0.975.

The proofs of Lemma 1, Theorem 1, and Proposition 3 are given in the Appendix.

3 |. COMPUTATION ALGORITHMS

3.1 |. A sweep algorithm

The main motivation for using the PIPE-classifier (2) is that each individual factor in the product can be easily and quickly computed without consideration of partial ordering; thus computations can be scaled to deal with problems with complex partial ordering structure as D increases. When the dimension D and number K of conditions are small, one can evaluate the set Γ by brute force, i.e., enumerating each g{0, 1}K and check if it belongs to Γ. Given Γ is determined, the additional computational cost of (2) is K|Γ|. Enumerating the entire set Γ, however, becomes infeasible quickly as D and K increase.

Suppose there exists tL such that γˆkB(t)=1 for all k and all ttL, and tU such that γˆkB(t)=0 for all k and ttU. Under this assumption, Proposition 2 implies that γˆtL=1 and γˆtU=0 because γˆBtL, γˆBtUΓ. Then the following sweep algorithm solves the inverse θˆ of PIPE-classifiers, or iPIPE, without the needs for enumerating Γ:

  1. Iterate t from tL to tU.

  2. For each t,
    1. Identify subset: Let 𝒵t=k:γˆkB(t)=0 and 𝒵t=j:xjxk,k𝒵t. Define Ct=𝒵t𝒵t𝒟0t, where 𝒟0t the index set for γˆk(t) that has been determined 0 and is initially a null set.
    2. Maximise: Evaluate the PIPE-classifiers γˆCt(t) and set γˆ(t) equal to γˆCt(t) on the subset kCt. Set γˆk(t)=1 for remaining kCt𝒟0t.
    3. Sweep zeros: For all k with γˆk(t)=0, set γˆkt=0 for all tt,tU and add these indices to 𝒟0t.
  3. Stop when γˆk(t)=0 for all k{1,,K}, and evaluate θˆk=mint:γˆk(t)=0 according to (7).

For the sweep algorithm to yield the true maximiser γˆt for all t, it requires:

Proposition 4. Step 2b yields the true maximiser γˆ(t) of H(g;t) for the given t.

As a consequence of Proposition 4 and Theorem 1, sweeping zeroes across the remaining t (step 2c) yields the correct yˆkt for all k𝒟0t.

The core idea of the algorithm is to break down the maximisation problem into maximisation over subsets Ct. In particular, since we start with a small value of t, the set 𝒵t and hence Ct are small at the beginning. As the algorithm iterates across t, the set 𝒟0t of determined zeroes increases thus limiting the size of 𝒟t and rendering the maximisation step feasible. Specifically, the computational cost in step 2b for each t is Ct×ΓCt provided that Ct is determined.

Note that as an easy corollary to Theorem 1, if γˆkt=1, then γˆk(t)=1 for t<t. Thus, one can define an analogous algorithm that starts with t=tU and sweep ones in the opposite direction.

3.2 |. Numerical illustration: clinical decisions for rehabilitation

We illustrate the sweep algorithm using a simulated data set in the context of evaluating clinical decisions for referring stroke patients to rehabilitation described in Section 1. For brevity in presenting the results, we consider only the four main factors in this subsection, i.e., having D=4 and K=34=81. Even in this reduced problem, there are 281 possible classifier ensembles without the partial ordering constraint and enumerating the partially ordered set Γ from the unconstrained space will be computationally prohibitive.

In the simulated data set, for each condition, we first generated the number mk of patients with that condition and then generated the number yk of patients referred to rehabilitation given mk, i.e., yk~binomialmk,θk where θk was the referral probability for condition k. To analyse the simulated data, we postulated a uniform prior on θk for all k, so that the unconstrained posterior distribution was θk~betayk+1,mk-yk+1 based on which each pk(t) was to be evaluated.

While the simulated data along with the specifications of mk and θk and the analysis results are provided in the web-based supporting materials, Table 1 gives some intermediate steps of the sweep algorithm applied to the simulated data with ϵ=0.5, described as follows:

  • We first determine the lower limit tL=0.019 as the largest threshold (to the third decimal place) so that γˆkBtL=1 for all k. We then iterate t on a grid with increments 0.001.

  • When t=0.020, the condition xk=(0, 1, 0, 1) is associated with γˆkBtL=0, and thus belongs to Z0.020; see first row under the column t=0.020 in Table 1. By partial ordering, the conditions (0, 0, 0, 0), (0, 1, 0, 0), and (0, 0, 0, 1) are included in 𝒵0.020; rows 2–4 in the table. Thus, the set C0.020 consists of 4 conditions. Applying (2) on the subset C0.020 gives γˆk(0.020)=1 for all kC0.020.

  • When t=0.029, the set C0.029 remains the same, although the condition (0, 0, 0, 0) now belongs to 𝒵0.029 and its associated γˆk(0.029)=0 by maximising HC0.029 over ΓC0.029. As a result, this condition is added to 𝒟0t and its associated γˆkt would be set at 0 for all t0.029.

  • When t=0.036, the set C0.036 consists of 3 conditions, all of which are determined to have γˆk(0.036)=0 in the maximisation step, and are added to 𝒟0t.

  • Table 1 further gives the intermediate results for t=0.043, 0.046, 0.049, 0.053, 0.056 to illustrate how the set Ct changes over the iteration. Overall, while the number of conditions belonging to 𝒵t and 𝒵t grows as t grows, the set 𝒟0t also grows. The size of Ct in this example ranges from 0 to 9 for all ttL,tU. Thus, the maximisation step (step 2b) is feasible computationally.

  • Note that the table only shows conditions that belong to Ct𝒟0t at a given t. Because γˆk(t)=1 for kCt𝒟-0t, they are not shown in the table to conserve space.

  • The iteration stops at t=tU=0.98 when γˆk(0.98)=0 for all k{1,,81}.

TABLE 1.

Illustration of the sweep algorithm applied to simulated data with outcomes yk~binomialmk,θk for condition xk. For each t, conditions that belong to Ct are shown (under the columns ‘set’) along with the PIPE-classifier γˆk(t) per step 2b.

xk mk yk t=0.020 t=0.029 t=0.036 t=0.043 t=0.046
set γˆk(t) set γˆk(t) set γˆk(t) set γˆk(t) set γˆk(t)
(0,1,0,1) 34 0 𝒵t 1 𝒵t 1 𝒵t 0 𝒟0t 0 𝒟0t 0
(0,0,0,0) 23 0 𝒵t 1 𝒵t 0 𝒟0t 0 𝒟0t 0 𝒟0t 0
(0,1,0,0) 17 0 𝒵t 1 𝒵t 1 𝒵t 0 𝒟0t 0 𝒟0t 0
(0,0,0,1) 26 1 𝒵t 1 𝒵t 1 𝒵t 0 𝒟0t 0 𝒟0t 0
(0,0,1,2) 15 0 𝒵t 1 𝒵t 1
(0,0,1,0) 22 1 𝒵t 1 𝒵t 1
(0,0,1,1) 1 0 𝒵t 1 𝒵t 1
(0,0,0,2) 29 1 𝒵t 1 𝒵t 1
(1,0,0,0) 14 0 𝒵t 0
xk mk yk t=0.049 t=0.053 t=0.056
set γˆk(t) set γˆk(t) set γˆk(t)
(0,1,0,1) 34 0 𝒟0t 0 𝒟0t 0 𝒟0t 0
(0,0,0,0) 23 0 𝒟0t 0 𝒟0t 0 𝒟0t 0
(0,1,0,0) 17 0 𝒟0t 0 𝒟0t 0 𝒟0t 0
(0,0,0,1) 26 1 𝒟0t 0 𝒟0t 0 𝒟0t 0
(0,0,1,2) 15 0 𝒵t 1 𝒵t 1 𝒵t 1
(0,0,1,0) 22 1 𝒵t 1 𝒵t 1 𝒵t 1
(0,0,1,1) 1 0 𝒵t 1 𝒵t 1 𝒵t 1
(0,0,0,2) 29 1 𝒵t 1 𝒵t 1 𝒵t 0
(1,0,0,0) 14 0 𝒟0t 0 𝒟0t 0 𝒟0t 0
(0,2,0,1) 33 1 𝒵t 1 𝒵t 0 𝒟0t 0
(0,2,0,0) 5 0 𝒵t 0 𝒟0t 0

3.3 |. Sequential random subset maximisation

The feasibility of the sweep algorithm depends on the size of Ct at each t. The sweep algorithm occasionally could get stuck at particular t when Ct is large. For a problem with large D and K, evaluating ΓCt may not be feasible in general. For these large D and large K situations, we propose a sequential random subset maximisation (SRSM) method γˆSRSM(t) to approximate γˆCt(t) defined in the sweep algorithm. First determine a subset size Ksub𝒟t:

  1. Select random subset: Randomly select a subset 𝒮t𝒟-t, where 𝒮tKsub and 𝒟-t𝒟t is the set associated with undetermined γˆk(t) on Ct and is initially set as Ct.

  2. Maximise: Evaluate γˆSt(t) and set γˆSRSM(t) equal to γˆSt(t) on the subset k𝒮t.

  3. Impose partial orders: Let tCt be the set of indices associated with γˆIt(t) whose values are implied by γˆSt(t) by partial ordering. Update γˆSRSM(t) on t accordingly.

  4. Update 𝒟-t𝒟-t𝒮tIt, and repeat step 1 until 𝒟-t=.

  5. Set γˆCt(t)=γˆSRSM(t) when the algorithm ends.

While there is no theoretical guarantee the maximisation step (step 2) in SRSM will yield the true γˆCt(t), we may repeat the algorithm many times and select the classifiers with maximum HCtγˆSRSM(t);t.

SRSM is intended to be applied in conjunction with step 2b of the sweep algorithm. However, the method can be a stand-alone algorithm for the evaluation of γˆ(t), by replacing Ct with {1,,K} in the algorithm. Table 2 summarises the performance of the SRSM algorithm when applied to the simulated data in Section 3.2 for directly evaluating the full vector γˆ(t) at t=0.5. In this case, as we know the ground truth from the sweep algorithm, the table records how frequent the SRSM is correct for different values of Ksub. As expected, the algorithm is correct more often with a larger Ksub. In reality where we do not have the ground truth, we require the algorithm only to be correct at least once (instead of most of the time). Therefore, as long as there is a non-trivial likelihood of getting the correct γˆ(t) on each SRSM run, the probability of getting the correct answer upon repeated runs will be very high. In our example, the likelihood is quite high (at 38%) even when a small Ksub=5 subset is sampled out of the K=81 possible conditions.

TABLE 2.

Performance of the SRSM algorithm for evaluating γˆ(0.5) in the simulated rehab data (D=4, K=81) using different Ksub. The SRSM algorithm is repeated 100 times for each Ksub. The ground truth is known based on the sweep algorithm.

Ksub 5 7 9 11 12 13 14 15 20
Number of correct classification 38 45 46 46 51 59 57 57 58

To illustrate how SRSM works with the sweep algorithm for a large D, we consider another simulated data set that include all 7 factors in a rehabilitation chart review, i.e., a total of 648 conditions. The data generation model is described in Section 5.3 below. At t=0.06 in the sweep algorithm, the set C0.05 consists of 25 conditions. First, to obtain the ground truth γˆC0.05(0.05), we enumerated the entire set ΓC0.05, identified 67,929 partially ordered classifier ensembles, and evaluated γˆC0.05(0.05) per (2). Next, we applied SRSM with Ksub=5 out of C0.05=25 possible conditions and repeated the algorithm 100 times. Five of the 100 repetitions yielded the true γˆC0.05(0.05). With Ksub=7, the number of correct classifications increased to 14. The SRSM algorithm with Ksub=7 and 100 repetitions took less than a minute to run on a local computer, while it took a few hours to enumerate ΓC0.05 on the same machine.

4 |. APPLICATIONS

4.1 |. Estimating population parameters under partial ordering

iPIPE is directly applicable to the estimation of population parameters associated with partially ordered conditions. Generally, let yk=yk1,yk2,,yk,mk denote a random sample of size mk from a population with parameter θk and nuisance parameters ϑk under condition xk, where θ=θ1,,θK is nondecreasing in x in terms of partial ordering. Then the PIPE-classifier (2) and its inverse (7) can be evaluated with pk(t)=Eθk>tyk with respect to the posterior marginal distribution of θk.

To illustrate, consider normal data yk with mean μk and variance σk2. Unconstrained Bayesian inference about μk and σk may be performed independently for each k based on a semiconjugate prior: μk~N0,τ02 and fσkσk-1. The posterior distribution of μk and σk can then be simulated from

μkσk,yk~Nμˆk,τˆk2 and fσkykτˆkϕμˆk/τ0σk-1kμˆk,σk (9)

where yk is the sample mean of the random sample, the function ϕ denotes standard normal density, k is the normal likelihood given yk, and

μˆk=mk/σk21/τ02+mk/σk2yk and τˆk2=1/τ02+mk/σk2-1.

Suppose that μk’s are nondecreasing in x in terms of partial ordering. i.e., setting θk=μk with nuisance parameters ϑ=σ1,,σK. Then we have

pkt= Φμˆk-tτˆkfσkykdσk, (10)

where Φ is the standard normal distribution function. Instead of evaluating (10) numerically, we can compute pk(t), for each given t, by drawing μk according to the posterior distribution (9).

In another example, consider binomial variable yk with size mk and response probability θk under condition k. Postulating a conjugate beta (a, b) prior on the probability parameter, we can draw unconstrained inference about θk based on the posterior distribution betaa+yk,b+mk-yk, i.e., pk(t)=1-Bet;a+yk,b+mk-yk, where Be(;a, b) is the distribution function of beta (a, b). To illustrate in a real data application, we consider data in 5,087 users in an app recommender study (Cheung et al., 2018). In this analysis, each user received app recommendations on a weekly basis and their usage was tracked over a four-week period. Specifically, in each of the four weeks, a user would receive xkd{0, 1, 2} app recommendations for d=1, 2, 3, 4. For illustration purposes, we consider a dichotomised response, defined as engaging an app in the system for at least three times during the week following the four-week period. The left panel of Figure 1 summarises the recommendation patterns xk and the app use data mk,yk for each condition: In the 5,087 users, we observe a total K=29 patterns.

Figure 1.

Figure 1.

Analysis of app recommender data in 5,087 users. In the forest plots, solid circles and squares respectively indicate posterior median of θk for unconstrained estimation and PIPE (i.e., ϵ=0.5); and each line indicates a 95% credible interval. The conditions are ordered according to the posterior median based on iPIPE. The dotted vertical lines in both plots indicate the largest iPIPE median θˆk and are given as a reference to indicate variability: the unconstrained estimates apparently are more variable across conditions than iPIPE.

The unconstrained estimates are obtained with uniform prior on each θk (i.e., a=b=1) and are plotted in the right panel of Figure 1, along with the iPIPE estimates (medians and 95% credible intervals) using the sweep algorithm. The impact of the partial ordering assumption is clearly demonstrated as the iPIPE estimates differ from the unconstrained estimates in two ways. First, iPIPE depicts which conditions are not effective with very low estimated response rates, whereas the unconstrained estimates are more variable across conditions. Second, the 95% credible intervals based on iPIPE are much narrower than those based on unconstrained estimation, particularly when mk is small.

4.2 |. Regression models

This subsection describes applications of iPIPE in regression models that include covariates or confounding variables as well as the multi-factor condition as predictor variables, for different outcome types. First consider linear regression for normal data:

yi=Kk=1αkI(conditioni=xk)+φTzi+ei, (11)

where yi and conditioni respectively denote the response and the condition of subject i with covariates zi and normal noise ei with variance σ2 for i=1,,n. Unconstrained Bayesian inference can be performed with the standard non-informative prior, which is uniform on (α, φ, log σ), or equivalently, fα, φ,σ2σ-2 (Gelman et al., 1995). However, when K is large and the numbers of observations for some conditions are small, a proper prior on αk’s may be used instead. Generally, it is easy to draw from the posterior distribution with respect to other prior distributions such as normal using standard software such as Rstan (Stan Development Team, 2021); see Section 5.1 for an example.

Under model (11), the effects of the conditions are expressed in terms of the intercepts αk’s. If these intercepts are partially ordered according to the K conditions, we can set θk=αk and the response surface θ can be estimated by θˆ with pk(t)=EIαk>ty1,,yn for k=1,,K, where the expectation is taken with respect to the unconstrained posterior distribution of αk. The nuisance parameters in this application are ϑ=φ,σ2.

In some applications, one of the conditions (say x1) is a control condition and the interest is in estimating the effect due to xk relative to x1, i.e., αk-α1. As such, we may impose no constraint between α1 and other αk’s; rather, partial ordering is applied to θk=αk-α1 for k=2,,K. That is, the response surface θ=θ2,,θK of interest has K-1 parameters and can be estimated using iPIPE with pk(t)=EIαk-α1>ty1,,yn for k=2,,K, where the expectation is taken with respect to the unconstrained posterior distribution of αk-α1. This illustrates how iPIPE can be applied to address constraints on different contrasts in model (11) in different applications, once the unconstrained posterior of αk’s is obtained.

Generalised linear models are commonly used for regression analysis of non-normal response data, including logistic and probit regression for binary data, Poisson regression for count data, gamma regression for nonnegative continuous data, as well as linear regression. The expected response Eyi is η-linear in the regression coefficients, i.e.,

η{E(yi)}ηi=k=1KαkI(conditioni=xk)+φTzi (12)

where η is a known link function (McCullagh and Nelder, 1989). There is a large literature on Bayesian analysis of the generalised linear models. Ibrahim and Laud (1991), for example, give an in-depth discussion on the use of Jeffrey’s prior. Due to the large number K of conditions, however, proper priors such as normal may be used for αk’s; see Dellaportas and Smith (1993) who discuss an application using Gibbs sampling to compute the posterior. The response surface θ is to be defined in terms of αk’s according to the application and estimated using iPIPE in an analogous way as in the linear model described above.

For right-censored survival data, Cox’s model assumes that the hazard function at time s is given by λ0s expηi where λ0(s) is the baseline hazard and the conditions and covariates zi have multiplicative effects on the hazards via ηi defined analogously to (12). While the focus of inference is often on the regression coefficients (α,φ), many Bayesian approaches to handle λ0(s) have been considered, including parametric models (Dellaportas and Smith, 1993), nonparametric prior process such as the gamma process (Burridge, 1981; Sinha and Dey, 1997), and avoiding modeling λ0(s) via the use of partial likelihood (Sinha et al., 2003). For example, the partial likelihood under proportional hazards is free of the nuisance λ0 and can be expressed

(α,φ)=i=1n{exp(ηi)j(yi)exp(ηi)}δi =i=1n[exp{k=1KαkI(conditioni=xk)+φTzi}j(yi)exp{k=1KαkI(conditionj=xk)+φTzj}]δi (13)
=i=1n[exp{k=2K(αkα1)I(conditioni=xk)+φTzi}j(yi)exp{k=2K(αkα1)I(conditionj=xk)+φTzj}]δi (14)

where yi is the minimum of censoring time and survival time of subject i, δi is the indicator of observing the survival time, and (s) is the risk set at time s. Note that when λ0 is unspecified, and assuming that conditionix1,,xK such that k=1KI(conditioni=xk=1, the right-hand-side of expression (13) is over-parameterised as the term α1 can be absorbed into the baseline hazard function. Unconstrained Bayesian inference about θk=αk-α1(and φ) can be based on partial likelihood (14) and the corresponding approximate posterior density (α,φ)f(α,φ), which is a limiting marginal posterior of (α,φ) under a fully Bayesian approach with a diffuse gamma process prior on λ0 (Sinha et al., 2003). If the response surface θ thus defined is assumed nondecreasing in x in terms of partial ordering, it can be estimated using iPIPE based on the unconstrained posterior draws of αk’s with pk(t)=EIαk-α1>ty1,,yn for k=2,,K in the same way as in the linear and generalised linear models. Note that if, in some applications, the response surface of interest is θkαk, a parametric form of λ0 is needed to ensure identifiability; see for example Dellaportas and Smith (1993) who propose Gibbs sampling for proportional hazards model for Weibull survival time, where the full conditionals are simulated using adaptive rejection sampling (Gilks and Wild, 1992).

4.3 |. Covariate-dependent response surface

Model (11) can be extended to accommodate situations where the multi-factor condition interacts with the covariates, i.e., having

yi=k=1KαkI(conditioni=xk)+φTzi+k=1KI(conditioni=xk)βkTzi+ei, for i=1,,n. (15)

For the coefficients of the interaction terms, we may postulate β1,,βK~N0,σB2I a priori, independently of the prior specified for the main effects αk’s and φ and the variance term σ for model (11). A relatively informative prior, i.e., a small σB, corresponds to the assumption that the effects of the K conditions follow exactly the same order under all zi. In the special case when βk0 (i.e., a degenerative prior), the response surface θkzi=EyiI(conditioni=xk,zi will follow the same full order of αk’s for each zi. Generally, the full order of a covariate-dependent response surface

θzi=α1+φ+β1Tzi,α2+φ+β2Tzi,,αK+φ+βKTzi, (16)

varies depending on zi even if it is subject to the same partial ordering constraint with respect to the condition x. The response surface (16) can be estimated using iPIPE, for each given zi, with pk(t)=Eαk+φ+βkTzi>ty where the expectation is taken with respect to the unconstrained posterior distribution of αk,φ,βk. Applications with the generalised linear models and Cox’s model are analogous.

4.4 |. Hierarchical models for repeated measurements

In situations where an individual has repeated observations under different conditions, one may estimate the individual response surface using hierarchical models. Let yij be the jth measurement of individual i under conditionij. An outcome model for these individuals can be expressed as

yij=k=1KαikI(conditionij=xk)+eij (17)

where the individual effects αik’s of condition xk can be viewed as random effects that are potentially dependent with each other via underlying risk factors or confounding variables zi, and are possibly correlated with noise eij~N0,σ2. As (17) is quite general, we illustrate in a concrete example where we examined the individual effects of sedentary breaks on cognitive performance in n=11 participants. A sedentary break condition was defined by 2 factors over an 8-hour period: break frequency xk1 and duration xk2. In addition to a control condition with no sedentary breaks denoted as x1=x11,x12=(0, 0), each factor had two levels in the experiments:

  • low frequency (xk1=1; a break every 60 minutes) vs high frequency (xk1=2; a break every 30 minutes);

  • low break duration (xk2=1; 1 minute per break) vs high duration (xk2=2; 5 minutes per break).

Thus, each participant would be evaluated under K=5 conditions on 5 different days with a 4–14 days washout period between conditions; see Table 3 for the list of condition xk for k=2, 3, 4, 5. We were interested in estimating the effects of sedentary break relative to the control condition in terms of the change in the Symbol Digit Modalities Test (SDMT) over an 8-hour period. While it was reasonable to assume that change in SDMT increases with each factor for each individual, we did not impose constraints between the control x1 and other condition, as we were interested in estimating θikαik-αi1 for k=2, 3, 4, 5 for each i.

TABLE 3.

Estimation of the population-level response surface θzi using the sedentary break data in 11 individuals: 6 men zi=0 and 5 women zi=1;mk is the number of observations available for a given condition k. For the unconstrained posterior and the constrained posterior, median (‘med’) of the posterior draws of θk are reported along with the 0.025 and 0.975 posterior quantiles (‘95% int’). For iPIPE, the respective quantities are obtained by setting ϵ=0.5, 0.025, 0.975.

(a) Dose-response analysis results for men zi=0
Condition k xk1,xk2 mk Unconstrained iPIPE, θˆk Constrained posterior
med 95% int med 95% int med 95% int
2 (1,1) 6 2.85 (−2.37, 10.1) 0.94 (−4.53, 7.57) −0.37 (−5.68, 4.84)
3 (1,2) 6 −0.12 (−5.33, 6.67) 0.94 (−4.53, 7.57) 1.94 (−3.26, 7.50)
4 (2,1) 6 −0.32 (−5.46, 6.54) 0.94 (−4.53, 7.57) 1.98 (−3.17, 7.43)
5 (2,2) 5 5.74 (−0.17, 12.4) 5.74 (−0.17, 12.4) 6.86 (1.22, 14.1)
(b) Dose-response analysis results for women zi=1
Condition k xk1,xk2 mk Unconstrained iPIPE, θˆk Constrained posterior
med 95% int med 95% int med 95% int
2 (1,1) 5 2.40 (−3.35, 9.73) 1.85 (−3.83, 9.74) 1.04 (−2.52, 2.15)
3 (1,2) 4 5.60 (−0.58, 12.82) 2.87 (−3.27, 10.7) 2.74 (1.10, 5.80)
4 (2,1) 5 1.56 (−4.48, 9.38) 1.85 (−3.83, 9.74) 1.33 (0.10, 4.45)
5 (2,2) 3 0.44 (−8.22, 7.01) 2.87 (−3.27, 10.7) 2.77 (2.74, 8.03)

We thus rewrite (17) as

yij=αi1+k=2KθikI(conditionij=xk)+eij (18)

and postulate that each θi is nondecreasing in x in terms of partial ordering. Under the parameterisation (18), αi1 indicates the mean response of participant i under the control condition. For the individual-level parameters, we postulate a priori that:

  • α11,,αn1~NμA,σA2;

  • θik~Nψk+ξkzi,σB2 for k=2, 3, 4, 5 for each individual i, where zi is the gender of individual i.

That is, the condition effects accounts for the gender in addition to the variability among individuals in the population. Further, the population-level parameters, namely μA, σA, ψk,ξk:k=2, 3, 4, 5, σB, and σ, has the following prior distributions:

  • μA has an improper flat prior, i.e., fμA1;

  • ψ2,ψ3,ψ4,ψ5~N(0, 1000);

  • ξ2,ξ3,ξ4,ξ5~N(0, 1000);

  • all variance parameters σA2,σB2,σ2 follow an inverse chi-squared distribution with 1 degree of freedom.

The “layer” of the population-level parameters in the hierarchical model facilitates pooling data from across individuals, but does not take advantage of partial ordering. To estimate a monotone individual response surface θi, one could evaluate θˆi using iPIPE according to the PIPE-classifier ensemble (2) and its inverse (7) with pik(t)=EIθik>ty, where the expectation is taken with respect to the posterior of θik, for k=2,,K.

In addition, we can make constrained inference about the population-level parameters using iPIPE. Specifically, we may define the population-level response surface

θzi=θ2zi,θ3zi,θ4zi,θ5ziψ2+ξ2zi,ψ3+ξ3zi,ψ4+ξ4zi,ψ5+ξ5zi

which is dependent on the covariate zi and is subject to the same partial ordering constraints as the individual θik’s. Then, for each given zi, the response surface θzi can be estimated using iPIPE with pkt=EIψk+ξkzi>ty for k=2,,K where the expectation is defined with respect to the unconstrained posterior of ψk,ξk.

The sedentary break data was fitted using model (18) with the above hierarchical structure using RStan with 4 chains each having 20,000 iterations after 5,000 warmup samples, and iPIPE was applied with ϵ=0.5 (point estimate), 0.025, 0.975 (95% credible interval), and wk=mk (number of observations for condition xk across individuals). As a comparison method, we also considered the constrained posterior that only included θik from the RStan samples that met partial ordering. Similar analyses were conducted for the population-level θkzi’s.

The analysis results for population-level response surface θzi are given in Table 3. When the unconstrained estimates do not violate any partial ordering constraint, the iPIPE estimates are similar to the unconstrained estimates; e.g., condition 4 in Table 3(a). When the unconstrained estimates violate partial ordering, iPIPE pools data from across conditions and equalises the estimates; e.g., in Table 3(b), iPIPE results in θˆ1=θˆ3 for conditions 1 and 3. In this regard, iPIPE behaves similarly to PAVA. In contrast, estimation based on the naive constrained posterior reverses the order of the unconstrained estimates for conditions 1 and 3. Similarly, the constrained posterior leads to significant conclusions (i.e., 95% credible interval excluding 0) for conditions 3 and 4 in Table 3(b), which could be an artifact of the constrained sampling scheme rather than the data.

Figure 2 displays the individual response surface estimates of 11 participants using the unconstrained posterior, iPIPE, and the constrained posterior. By imposing partial ordering, iPIPE (Figure 2(b)) reduces between-individual variability and produces estimates in a range similar to the unconstrained fits (Figure 2(a)). In contrast, the constrained posterior seems to lead to artificially exaggerated dose-response relationships at individual levels; e.g., see estimates for individual “f”.

Figure 2.

Figure 2.

Estimated individual response surface θi in the sedentary break study, for i=1,,11. The estimates for participant “f” are color-coded for visibility of illustration.

5 |. SIMULATION STUDY

5.1 |. Simulation setting 1: D=2 with equal sample size

In this section, we evaluate the performance of iPIPE when compared to other methods using simulation. The first simulation study examines situations with small D and K, namely, D=2 with xk1{1, 2, 3} and xk2{1, 2} and K=6 conditions. Each condition has equal number of mk=5 independent observations, generated from normal with mean θk and a common standard deviation σ=2, where θk=β1xk1+β2xk2+β12logxk1xk2. Four scenarios of βs are considered in the simulation and are given in Table 4.

TABLE 4.

Simulation setting 1: D=2, K=6 with equal sample size

(a) Scenario 1: β1=1,β2=2,β12=0
Condition k True Unconstrained iPIPE, θˆk Constrained posterior PAVA
xk1 xk2 θk bias mse cov bias mse cov bias mse cov bias mse
1 1 3.0 −0.07 0.79 0.96 −0.26 0.60 0.98 −0.90 1.17 0.77 −0.27 0.60
2 1 3.4 0.01 0.82 0.96 −0.01 0.43 0.99 −0.12 0.30 0.98 −0.01 0.43
3 1 3.7 0.04 0.84 0.96 0.23 0.60 0.98 0.67 0.81 0.86 0.23 0.59
1 2 5.0 −0.10 0.75 0.95 −0.27 0.53 0.98 −0.69 0.75 0.87 −0.27 0.54
2 2 5.4 −0.02 0.81 0.96 −0.04 0.41 0.99 0.07 0.24 0.98 −0.05 0.41
3 2 5.7 0.01 0.83 0.95 0.24 0.60 0.98 0.93 1.19 0.79 0.23 0.59
(b) Scenario 2: β1=1,β2=2,β12=-0.25
Condition k True Unconstrained iPIPE, θˆk Constrained posterior PAVA
xk1 xk2 θk bias mse cov bias mse cov bias mse cov bias mse
1 1 3.0 −0.05 0.80 0.95 −0.29 0.57 0.97 −1.00 1.30 0.76 −0.29 0.57
2 1 3.2 −0.01 0.77 0.96 −0.01 0.36 0.99 −0.11 0.20 0.99 −0.01 0.36
3 1 3.5 0.03 0.77 0.96 0.26 0.52 0.99 0.67 0.81 0.85 0.26 0.51
1 2 5.0 −0.05 0.82 0.95 −0.33 0.61 0.97 −0.93 1.11 0.77 −0.34 0.61
2 2 5.1 −0.03 0.79 0.96 −0.01 0.34 0.99 0.07 0.20 0.99 −0.01 0.34
3 2 5.2 0.01 0.78 0.97 0.31 0.54 0.98 1.06 1.39 0.66 0.30 0.54
(c) Scenario 3: β1=0,β2=2,β12=0.20
Condition k True Unconstrained iPIPE, θˆk Constrained posterior PAVA
xk1 xk2 θk bias mse cov bias mse cov bias mse cov bias mse
1 1 2.0 −0.10 0.81 0.95 −0.36 0.62 0.97 −1.08 1.46 0.71 −0.37 0.62
2 1 2.1 −0.02 0.85 0.95 −0.05 0.36 0.99 −0.12 0.24 0.98 −0.05 0.36
3 1 2.2 0.00 0.87 0.95 0.31 0.59 0.98 0.87 1.09 0.81 0.31 0.59
1 2 4.0 −0.05 0.78 0.97 −0.27 0.59 0.98 −0.74 0.93 0.85 −0.27 0.59
2 2 4.3 −0.01 0.76 0.96 −0.02 0.36 0.99 0.10 0.28 0.99 −0.02 0.36
3 2 4.4 0.06 0.75 0.96 0.30 0.55 0.97 1.03 1.35 0.71 0.30 0.54
(d) Scenario 4: β1=1,β2=0,β12=0.50
Condition k True Unconstrained iPIPE, θˆk Constrained posterior PAVA
xk1 xk2 θk bias mse cov bias mse cov bias mse cov bias mse
1 1 1.0 0.02 0.78 0.95 −0.30 0.56 0.97 −1.13 1.58 0.65 −0.30 0.56
2 1 1.8 −0.06 0.82 0.96 −0.21 0.42 0.99 −0.58 0.50 0.95 −0.22 0.42
3 1 2.3 0.05 0.80 0.96 0.05 0.41 0.99 0.19 0.24 0.98 0.05 0.42
1 2 1.0 −0.01 0.74 0.96 0.19 0.44 0.99 0.14 0.28 0.99 0.18 0.44
2 2 2.1 0.03 0.84 0.95 0.11 0.43 0.99 0.37 0.33 0.95 0.11 0.43
3 2 2.8 −0.07 0.66 0.98 0.16 0.47 0.99 0.99 1.22 0.71 0.16 0.47

mse, mean squared error; cov, coverage probability.

Unconstrained estimates of θk are obtained by fitting a Bayesian linear model with θ1,,θ6~N(0, 1000) and σ2 following an inverse chi-squared distribution with 1 degree of freedom. We ran 4 chains in RStan each having 1,250 iterations after discarding 1,000 warmup samples (i.e., having 5,000 samples from the unconstrained posterior). The iPIPE point estimates are then obtained with ϵ=0.5 and interval estimates with ϵ=0.025 and 0.975. We also evaluate the constrained posterior estimates that include only posterior draws (out of 5,000 total) that meet partial ordering. In addition, we consider the frequentist PAVA as a comparison method.

The estimation properties of these methods are compared in Table 4. Overall, iPIPE consistently yields smaller mean squared error (mse) than the unconstrained estimates; the efficiency gain in terms of mse is quite substantial and results in reduction greater than 50% under some scenarios. The bias of iPIPE is small relative to mse. In contrast, the constrained posterior median can yield large bias, which in turn result in a large mse. Additionally, even though it is feasible to evaluate the constrained posterior estimates in this simulation because of low dimension, the proportions of draws that meet partial ordering are low, with mean ranging from 4% to 6% in the scenarios considered.

Finally, it is noteworthy that the estimation properties (bias and mse) of iPIPE are very similar to PAVA, while the former naturally produces interval estimation for inference. The simulation shows that 95% credible intervals of iPIPE achieve nominal (if conservative) coverage probability, whereas biases of the constrained posterior estimates lead to lower-than-nominal coverage probability.

5.2 |. Simulation setting 2: D=4 with uneven sample sizes

The second simulation study examines iPIPE when D=4 with xkd{1, 2} and K=16 conditions with unequal sample sizes mk. This represents situations with sparse sampling of some conditions. A binomial outcome yk was generated with size mk and probability θk for condition k in each simulation replicate. Unconstrained estimates of θk are obtained as the median of the beta posterior assuming a uniform prior. Correspondingly, iPIPE estimates are obtained with ϵ=0.5 and wk=mk. In this case, it is infeasible to evaluate constrained posterior by using only draws that meet partial ordering, because the partial ordering constraints are restrictive and that acceptance rate is extremely small. Similarly, it is not straightforward to implement PAVA in this setting.

Figure 3 plots the distributions of the unconstrained and iPIPE point estimates under a given θ based on 1,000 simulation replicates. Overall, the iPIPE estimates exhibit smaller variability than the unconstrained estimates, without inducing noticeable bias. Reduction in variability is pronounced when mk is small; for example, xk=(2, 2, 2, 2) and = (1, 1, 1, 2) when mk=5. Additional simulation scenarios (given in web-based supporting material) confirm similar observations.

Figure 3.

Figure 3.

Simulation setting 2: D=4, K=16 with unequal sample sizes and binomial outcomes. The horizontal axis of the box plots indicates the difference between true θk and the respective point estimates for each k.

In addition to point estimates, 95% credible intervals are obtained for the unconstrained method and iPIPE. The coverage probabilities for all 16 conditions range 0.96 to 0.99 for iPIPE, and 0.94 to 0.98 for unconstrained. While the iPIPE estimates appear to be more conservative, it has narrower intervals on average: average width over all 16 conditions is 0.25 for iPIPE, compared to 0.30 for the unconstrained method.

5.3 |. Simulation setting 3: D=7 with uneven sample sizes

In this subsection, we conduct simulation for settings with D=7 and K=648 with binomial outcomes as described in Section 1. In the simulated data, for condition k, we first generate mk=34uk+1 where uk’s are independent uniform(0,1) and keep mk fixed across simulation replicates. In each replicate, we draw yk~binomialmk,θk where

logit(θk)=5+1.5(d=17xkd)0.5+0.75xk12+0.5xk2xk3. (19)

We analyse each simulated data set using iPIPE implemented by the sweep/SRSM algorithm with Ksub=5 and 100 repetitions, as well as unconstrained posterior estimate, with θk~uniform(0, 1) a priori. Figure 4 shows the results based on 100 simulated data sets. Both methods have similar magnitude of bias. However, the unconstrained median demonstrates a clear trend of positive bias when the true θk is small and negative bias when the true θk is large, when mk is small so that the uniform prior has large influence. In contrast, biases of iPIPE are much attenuated for conditions with small mk. This translates into smaller mse by iPIPE, noticeably when mk is small.

Figure 4.

Figure 4.

Simulation setting 3: D=7, K=648 with unequal sample sizes and binomial outcomes. Each circle in the figures represent a condition and the size of the circle is proportional to mk.

The respective average coverage probabilities for the unconstrained method and iPIPE are 0.95 and 0.99. While iPIPE appears to be conservative, it achieves the nominal level with a higher precision (average interval width 0.33) than the unconstrained method (average interval width 0.38).

5.4 |. Simulation setting 4: The effect of dimension D

Finally, we consider settings with conditions defined by D binary factors, i.e., xkd1, 2 for each d=1,,D, where D=10, 11, 12, with total sample size N=5000 or 10000. An objective of this simulation study is to examine the impact of D on the performance of iPIPE. There are 2D possible conditions for a given D. We first generate mk’s from a multinomial distribution with size N and probability 1/2D for each k; and keep mk’s fixed across simulation replicates. In each replicate, we generate a random sample of size mk under condition k from an exponential distribution with rate θk, i.e., fykjθk=θke-θkykj for yjk>0 and j=1,,mk, where

log θk=2.5+d=1Dxkd2d+1+xk12Dd=1Dxkdd+xk222Dd=1Dxkdd+0.125(d=3Dxkd)1/(D2). (20)

For the unconstrained Bayesian inference, we postulate θk’s to be exchangeable Gamma variables with shape 0.1 and scale 10 a priori, so the posterior is Gamma with shape mk+1 and scale 10/1+10j=1mkykj. The iPIPE point estimates are then obtained with ϵ=0.5 and interval estimates with ϵ=0.025, and 0.975 using the sweep/SRSM algorithm with Ksub=7 and 50 repetitions. iPIPE and the unconstrained Bayesian inference are evaluated based on 50 simulation replicates.

Both methods have similar magnitude of bias, which seems to remain in range as D increases (Figure 5). In contrast, iPIPE has much smaller mse than unconstrained inference especially as D increases (Figure 6). We note that these are simulation scenarios with sparse observations. For example, when D=12 and N=5000, there are K=2886 conditions with at least one observation, i.e., about 1.7 observations per condition. Hence, iPIPE improves upon the unconstrained inference by reducing variability, while keeping bias similar.

Figure 5.

Figure 5.

Simulation setting 4: Bias vs true θk for D=10, 11, 12 and N=5000, 10000. Each circle in the figures represent a condition and the size of the circle is proportional to mk.

Figure 6.

Figure 6.

Simulation setting 4: Root mean squared error vs true θk for D=10, 11, 12 and N=5000, 10000. Each circle in the figures represent a condition and the size of the circle is proportional to mk.

Table 5 shows the aggregate performance of the two methods in terms of bias, mse, coverage probability, and mean width of 95% credible intervals averaged across all θk’s for each pair (D, N). The average biases are small relative to mse and remain range bound. As expected, mse increases as D increases, as the situations reflects increasingly sparse observations per condition. However, the performance of iPIPE relative to unconstrained inference improves quite substantially as D increases, indicating the information contained in the partial ordering assumption. Additionally, by doubling N from 5000 to 10000, the average mse is reduced by 40% to 50% for all D considered. For iPIPE, we also examine the estimation properties for the PIPE-classifier ensemble γk(t) by γˆk(t) in terms of average classification error (ACE) across all K conditions, defined as

ACE=1Kk=1K proportion of {γˆk(t)γk(t)}. (21)

TABLE 5.

Simulation setting 4: D=10, 11, 12 with sparse sampling

D N K Unconstrained iPIPE, θˆK
bias mse cov width bias mse cov width ACE¯
10 5000 1014 0.18 1.10 0.96 3.2 0.067 0.13 0.99 2.40 0.029
10000 1024 0.095 0.45 0.95 1.93 0.087 0.092 0.99 1.72 0.026
11 5000 1876 0.31 1.94 0.96 5.03 −0.021 0.18 0.99 3.16 0.039
10000 2034 0.20 1.15 0.96 3.15 0.034 0.12 0.99 2.35 0.029
12 5000 2886 0.38 2.42 0.97 6.50 −0.14 0.26 0.99 3.75 0.056
10000 3746 0.31 1.96 0.96 5.02 −0.081 0.17 0.99 3.03 0.042

K indicates the number of conditions with mk>0.

bias, average bias; mse, average mean squared error; cov, average coverage probability; width, average mean width of 95% credible intervals; ACE¯, average of ACE (21) over 50 simulation replicates.

Table 5 reports the average ACE(ACE¯) calculated by averaging ACE across all simulation replicates. ACE¯ correlates with the average mse for estimating θ: it increases as D increases and N decreases, but is low in all the scenarios we considered. One can improve ACE¯ and estimation accuracy by increasing Ksub and the number of repetitions in the SRSM algorithm. For example, we ran simulation using 100 repetitions (instead of 50) in SRSM for the scenario D=12 and N=5000 and obtained ACE¯=0.052 (vs. 0.056) and an average mse of 0.24 (vs. 0.26). While the improvement is minimal because 50 repetitions seem to be adequate, it suggests ACE¯ is a good proxy for the adequacy of the SRSM algorithm. An advantage of using ACE¯ over the use of mean squared errors is that ACE¯ takes values on [0, 1], thus providing an index that facilitates benchmarking and interpretation of the accuracy of the method.

Finally and importantly, credible intervals of both methods achieve the nominal level, although iPIPE seems to be conservative and at the same time reduces the average width of the intervals. This indicates iPIPE retains its accuracy in quantifying uncertainty even under these sparse settings.

6 |. DISCUSSION

In this article, we deal with the estimation of monotone response surface defined on multiple factors and observed on a large number K of distinct conditions. We make two main contributions. First, we have proposed an estimation method, called iPIPE, by inverting a partially ordered classifier ensemble (PIPE-classifiers). While the PIPE-classifiers are motivated with a decision theoretic framework with a classification-type gain function, they may be viewed as a projection of Bayes classifiers on the constrained space of partial ordering. iPIPE is nonparametric in that the method does not rely on any assumptions (e.g. additivity and smoothness) other than monotonicity. In our data examples and simulation, we have demonstrated that point estimation based on iPIPE behaves similarly to PAVA. The Bayesian decision-theoretic framework facilitates interval estimation and we have demonstrated that iPIPE-based 95% credible intervals achieve the frequentist coverage probability, and in fact, are conservative in our simulation scenarios. Such conservativeness interestingly comes with higher precision (shorter widths) when compared to the unconstrained Bayesian inference, and will warrant further investigation. Additionally, simulation results consistently show iPIPE has smaller mse than unconstrained estimation; and efficiency gain is particularly substantial when sampling of conditions is sparse. Also, iPIPE is versatile. It can be applied with many common statistical models described in Section 4, and it is potentially applicable to advanced semi-parametric models such as in spatiotemporal modeling; e.g., in Gaussian processes for spatial data (e.g., Banerjee et al. (2008); Datta et al. (2016)), the estimation of the cross-covariance function may be improved using iPIPE as the covariance between two locations is conceivably non-increasing in the distance between the locations. This represents an interesting and important line of future research.

Second, we have proposed algorithms that render iPIPE computationally feasible for estimating moderate-to-high dimension response surface, while the existing literature of estimating multivariate monotone functions has discussed little on situations with D>4. Specifically, we have proposed a sweep algorithm and have proved that it gives the true iPIPE θˆ defined in (7). At first glance, estimation by inverting a classification problem is more computationally intensive than the classification problem itself, because it involves iterating a threshold t on a fine grid and it has to solve the classifiers for each t. The sweep algorithm is interesting in that it takes advantage of the iteration step together with a sweep step (step 2c) to reduce the optimisation problem in classification (PIPE) into a more manageable subset maximisation step (step 2b). That is, the sweep algorithm integrates the two problems: estimation (iPIPE) can be a means to evaluating the classifiers (PIPE), while the former is in principle constructed by evaluating the latter at all thresholds. We have also proposed a sequential random subset maximisation (SRSM) algorithm to supplement the sweep algorithm. The idea of SRSM is to further reduce the subset maximisation step in sweep (step 2b) into even smaller computation tasks over sequentially selected random subsets. While there is no theoretical guarantee of giving the true maximisers, the SRSM method identifies the true maximisers in our data illustrations; and its likelihood of success can be enhanced by running the algorithms many times. We have applied the sweep/SRSM algorithm to analyse simulated data with D10 factors and K>1000 conditions. We note that the computational costs of SRSM grow linearly in K, as opposed to |Γ|. In addition, while SRSM performs computation tasks over subsets of conditions sequentially, a possible alternative is to perform maximisation of each subset in parallel, and then pool and harmonise the results with respect to the constraint. Subset maximisation thus naturally lends itself to a divide-and-conquer approach (Guhaniyogi and Banerjee, 2018; Jordan et al., 2019), which can be implemented in parallel on multi-core machines or high-performance computing clusters. As such, the method can be scaled to address massive problems due to large dimension D by leveraging the underlying computational architecture.

Supplementary Material

final submitted supplement

acknowledgements

This work was supported by NIH grants R01HL153642, R01MH109496, and UL1TR001873. This work was also supported by the Robert N. Butler Columbia Aging Center of Columbia University.

Appendix

| Appendix A: Proof of Lemma 1

For brevity, we will omit the threshold t in the proof of Lemma 1. Restating Lemma 1, we aim to prove γˆi1kγˆi0k for all k{1,,K} for a given i.

First, since γˆi0Γi0 and γˆi1Γi1, we have γˆi1i=1>0=γˆi0i by definition.

Next, partition the set {1,,K} into i=k:xkxi,𝒰i=k:xk>xi, and i={1,,K}LiUi{i}. Because γˆi0i=0, we have γˆi0k=0 for ki by monotonicity; and since γˆi1k{0, 1}, we have γˆi1k0=γˆi0k on i. Similarly, we observe that γˆi1k=1 for k𝒰i because γˆi1i=1, and hence γˆi1kγˆi0k on 𝒰i. Further split i into two sets: i0=ki:γˆi0k=0 and i1=ki:γˆi0k=1. By the definition of i0, we have γˆi0k=0γˆi1k{0, 1} for ki0. The proof of Lemma 1 will be completed by proving:

Claim 1. γˆi1k=1 for ki1.

Recall that gi1=gk:ki1 denotes the subvector g on i1, and suppose gi1Γi1. Construct a classifier ensemble γ˜i0=γ˜i01,,γ˜i0K as follows:

γ˜i0k=γˆi0k for k-i1=i𝒰ii0{i} and γ˜i0i1=gi1. (22)

Claim 2. γ˜i0Γ. Hence, γ˜i0Γi0 because γ˜i0i=0.

Proof of Claim 2: Since γˆi0Γi0Γ, we have γˆi0εi1Γ-i1. Thus, γ˜i0εi1=γˆi0εi1Γ-i1. Also by (22), we have γ˜i0i1=gi1Γi1. That is, partial ordering of γ˜i0 holds within -i1 and i1. To prove Claim 2, it remains to show that partial ordering of γ˜i0 holds for every pair ki1 and k-i1:

  • First, consider the case kii0{i} where γˆi0k=0; and recall that γˆi0k=1 for ki1. Since γˆi0Γ, partial ordering will hold between γˆi0k and γˆi0k implying that xkxk. Because γ˜i0k=γˆi0k=0,gk can take on any value while partial ordering will hold between gk and γ˜i0k.

  • Second, consider the case k𝒰i where γˆi0k=1. Note that xkxk: because xkxk would imply xixk, which would in turn put k𝒰i by definition of 𝒰i. As a result, because γ˜i0k=γˆi0k=1, gk can take on any value while partial ordering will hold between gk and γ˜i0k.

This completes the proof of Claim 2.

Next, write H(g;t)=k=1Kϕkgkρk1-gk where

ϕk=ϕk(t)=ϵpk(t)wk and ρk=ρk(t)=(1-ϵ)1-pk(t)wk, (23)

i.e., dependence on t is omitted for brevity. Then we have

H(γˆi0;t)=ρikiρkk𝒰iϕkγˆi0kρk1γˆi0kki0ρkki1ϕk (24)

based on the definitions of i, i0, and i1. Similarly, we can write

H(γ˜i0;t)=ρikiρkk𝒰iϕkγˆi0kρk1γˆi0kki0ρkki1ϕkgkρk1gk (25)

where gi1Γi1. Because γˆi0 maximises H on Γ0i by definition (8) and γ˜i0Γ0i per Claim 2, we have γˆi0;tHγ˜i0;t. Applying this with (24) and (25), we obtain the inequality

ki1(ϕkρk)1gk1 (26)

for any gi1Γi1.

Finally, suppose that γˆi1k=0 for some ki1 where γˆi1 maximises H on Γi1 per (8). Construct an ensemble γ˜i1{0, 1}K as follows: define

γ˜i1k=γˆi1k for k-i11 for ki1. (27)

Using similar arguments used in the proof of Claim 2 above, we can show that γ˜i1Γi1. Further since γˆi1Γ, the subvector γˆi1i1Γi1. Then we obtain

H(γ˜i1;t)H(γˆi1;t)=ki1(ϕkρi)1γˆi1k1. (28)

The inequality in (28) is a result of (26) and the fact that γˆi1εi1Γi1. However, the inequality (28) contradicts the definition of γˆi1 as the maximiser of H on Γi1, except when γˆi1k=1 for all ki1. Thus, by contradiction, γˆi1k0 for any ki1. This completes the proof of Claim 1 and Lemma 1.

| Appendix B: Proofs of Theorem 1, Proposition 3, and Proposition 4

Proof of Theorem 1: First consider a fixed i as in Lemma 1. The maximiser γˆ(t) of H(g;t) over Γ will be either γˆi0(t) or γˆi1(t), the respective maximisers over Γi0 and Γi1. Specifically,

γˆi(t)=0 γˆ(t)=γˆi0(t) (29)
Hγˆi0(t);t>Hγˆi1(t);t. (30)

Equation (29) holds because γˆi0(t)Γ0i and expanding on (30) gives

k{i}iρk(t)k𝒰iiϕk(t)γˆi0kρk(t)1γˆi0k>k{i}𝒰iϕk(t)kiiϕk(t)γˆi1kρk(t)1γˆi1k (31)

where ρk(t) and ϕk(t) are defined in (23). Dividing both sides of (31) by its right-hand side further gives

ρi(t)ϕi(t)×k𝒰i(ρk(t)ϕk(t))1γˆi0k×ki(ρk(t)ϕk(t))γˆi1k×ki(ρk(t)ϕk(t))rˆi1kγˆi0k>1. (32)

The first three terms in (32) are increasing functions in t, because it can be easily verified that ρk(t) is increasing and ϕk(t) is decreasing in t. In addition, Lemma 1 implies that γˆi1k-γˆi0k0 and therefore the fourth term in (32) is also increasing t. Therefore, for any t>t, the inequality (32) implies

ρi(t)ϕi(t)×k𝒰i(ρk(t)ϕk(t))1γˆi0k×ki(ρk(t)ϕk(t))γˆi1k×ki(ρk(t)ϕk(t))γˆi1kγˆi0k>1 (33)

which is equivalent to γˆt=γˆi0tγˆit=0 using the same logic as (29) and (30). We have thus showed γˆi(t)=0γˆit=0 for any given i and completed the proof of Theorem 1.

Proof of Proposition 3: Let Fkty1,,yK denote the posterior cdf of θk and assume that it is continuous, so that pk(t)=Eγk(t)y1,,yk=1-Fk(t). The estimator θˆk will thus solve 1-Fkθˆk=1-ϵθˆk=Fk-1(ϵ).

Proof of Proposition 4: First, we note that γˆCt=1 maximises HCt- over ΓCt, by applying Proposition 2 with the fact that γˆkB(t)=1 for kCt (hence Zt). Next, we note that xkxk for any pair kCt and kCt. Thus, the ensemble formed by putting γˆCt and γˆCt together will belong to Γ; and since they maximise HCt and HCt respectively, the ensemble thus formed maximises H=HCt-×HCt.

Footnotes

conflict of interest

The authors have no conflicts of interest to disclose.

data availability statement

The data underlying this article will be shared on reasonable request to the corresponding author.

references

  1. Ayer M, Brunk HD, Ewing GM, Reid WT and Silverman E (1955) An empirical distribution function for sampling with incomplete information. The Annals of Mathematical Statistics, 26, 641–647. [Google Scholar]
  2. Bacchetti P (1989) Additive isotonic regression. Journal of the American Statistical Association, 84, 289–294. [Google Scholar]
  3. Banerjee S, Gelfand AE, Finley AO and Sang H (2008) Gaussian predictive process models for large spatial data sets. J. R. Statist. Soc. B, 70, 825–848. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Barlow RE, Bartholowmew DJ, Bremmer JM and Brunk HD (1972) Statistical Inference Under Order Restrictions. John Wiley & Sons, New York. [Google Scholar]
  5. Bornkamp B, Ickstadt K and Dunson D (2010) Stochastically ordered multiple regression. Biostatistics, 11, 419–431. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Brunk HD (1955) Maximum likelihood estimates of monotone parameters. The Annals of Mathematical Statistics, 26, 607–616. [Google Scholar]
  7. Burridge J (1981) Empirical Bayes analysis of survival time data. J. R. Statist. Soc. B, 43, 65–75. [Google Scholar]
  8. Cheung K, Ling W, Karr CJ, Weingardt K, Schueller SM and Mohr DC (2018) Evaluation of a recommender app for apps for the treatment of depression and anxiety: an analysis of longitudinal user engagement. Journal of the American Medical Informatics Association, 25, 955–962. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Cheung Y, Chandereng T and Diaz KM (2022) A novel framework to estimate multidimensional minimum effective doses using asymmetric posterior gain and e-tapering. The Annals of Applied Statistics, 16, 1445–1458. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Chung Y, Ivanova A, Hudgens M and Fine J (2018) Partial likelihood estimation of isotonic proportional hazards models. Biometrika, 105, 133–148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Datta A, Banerjee S, Finley AO and Gelfand AE (2016) Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, 111, 800–812. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Dellaportas P and Smith A (1993) Bayesian inference for generalized linear and proportional hazards models via Gibbs sampling. Appl. Statist, 42, 443–459. [Google Scholar]
  13. Fu A, Narasimhan B and Boyd S (2020) CVXR: An R package for disciplined convex optimization. Journal of Statistical Software, 94, 1–34. [Google Scholar]
  14. Gelman A, Carlin J, Stern H and Rubin D (1995) Bayesian Data Analysis. Chapman & Hall. [Google Scholar]
  15. Gilks W and Wild P (1992) Adaptive rejection sampling for gibbs sampling. Appl. Statist, 41, 337–348. [Google Scholar]
  16. Guhaniyogi R and Banerjee S (2018) Meta-kriging: scalable Bayesian modeling and inference for massive spatial datasets. Technometrics, 60, 430–444. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Holmes CC and Heard NA (2003) Generalized monotonic regression using random change points. Statistics in Medicine, 22, 623–638. [DOI] [PubMed] [Google Scholar]
  18. Ibrahim J and Laud P (1991) On Bayesian analysis of generalized linear models using Jeffrey’s prior. Journal of the American Statistical Association, 86, 981–986. [Google Scholar]
  19. Jordan MI, Lee JD and Yang Y (2019) Communication-efficient distributed statistical inference. Journal of the American Statistical Association, 114, 668–681. [Google Scholar]
  20. Leitenstorfer F and Tutz G (2007) Generalized monotonic regression based on B-splines with an application to air pollution data. Biostatistics, 8, 654–673. [DOI] [PubMed] [Google Scholar]
  21. Lin L and Dunson DB (2014) Bayesian monotone regression using Gaussian process projection. Biometrika, 101, 303–317. [Google Scholar]
  22. Mander A and Sweeting M (2015) A product of independent beta probabilities dose escalation design for dual-agent phase I trials. Statistics in Medicine, 34, 1261–1276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. McCullagh P and Nelder J (1989) Generalized Linear Models. Chapman & Hall/CRC, second edn. [Google Scholar]
  24. Morton-Jones T, Diggle P, Parker L, Dickinson HO and Binks K (2000) Additive isotonic regression models in epidemiology. Statistics in Medicine, 19, 849–859. [DOI] [PubMed] [Google Scholar]
  25. Ramsay JO (1988) Monotone regression splines in action. Statistical Science, 3, 425–441. [Google Scholar]
  26. Robertson T, Wright FT and Dykstra RL (1988) Order Restricted Statistical Inference. John Wiley & Sons, New York. [Google Scholar]
  27. Sinha D and Dey DK (1997) Semiparametric Bayesian analysis of survival data. Journal of the American Statistical Association, 92, 1195–1212. [Google Scholar]
  28. Sinha D, Ibrahim J and Chen M (2003) A Bayesian justification of Cox’s partial likelihood. Biometrika, 90, 629–641. [Google Scholar]
  29. Stan Development Team (2021) RStan: the R interface to Stan. R package version 2.21.3, https://mc-stan.org/.
  30. Stein J, Rodstein BM, Levine SR, Cheung K, Sicklick A, Silver B, Hedeman R Egan A, Borg-Jensen P and Magdon-Ismail Z (2022) Which road to recovery?: Factors influencing postacute stroke discharge destinations: A Delphi study. Stroke, 53, 947–955. [DOI] [PubMed] [Google Scholar]
  31. Wang Y and Taylor J (2004) Monotone constrained tensor-product B-spline with application to screening studies. The University of Michigan Department of Biostatistics Working Paper Series, 1022, Berkeley Electronic Press. [Google Scholar]
  32. Wright FT (1982) Monotone regression estimates for grouped observations. The Annals of Statistics, 10, 278–286. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

final submitted supplement

Data Availability Statement

The data underlying this article will be shared on reasonable request to the corresponding author.

RESOURCES