Monotone response surface of multi-factor condition: estimation and Bayes classifiers

Ying Kuen Cheung; Keith M Diaz

doi:10.1093/jrsssb/qkad014

. Author manuscript; available in PMC: 2024 Apr 1.

Published in final edited form as: J R Stat Soc Series B Stat Methodol. 2023 Mar 22;85(2):497–522. doi: 10.1093/jrsssb/qkad014

Monotone response surface of multi-factor condition: estimation and Bayes classifiers

Ying Kuen Cheung ^1,^*, Keith M Diaz ²

PMCID: PMC10919322 NIHMSID: NIHMS1970282 PMID: 38464683

Abstract

We formulate the estimation of monotone response surface of multiple factors as the inverse of an iteration of partially ordered classifier ensembles. Each ensemble (called PIPE-classifiers) is a projection of Bayes classifiers on the constrained space. We prove the inverse of PIPE-classifiers (iPIPE) exists, and propose algorithms to efficiently compute iPIPE by reducing the space over which optimisation is conducted. The methods are applied in analysis and simulation settings where the surface dimension is higher than what the isotonic regression literature typically considers. Simulation shows iPIPE-based credible intervals achieve nominal coverage probability and are more precise compared to unconstrained estimation.

Keywords: Clinical decision support tool, partial ordering, posterior quantiles, sweep algorithm, weighted posterior gain

1 |. INTRODUCTION

In clinical studies and health systems, interventions and patient conditions are often defined by multiple factors. To assess the total effect of an intervention or a condition, we can estimate the response surface as a multivariate function of the individual factors. In this article, we focus on monotone response surface, which is a reasonable assumption in many applications such as dose-response studies and clinical decision making. Specifically, consider a study with $K$ multi-factor conditions. Let $x_{k} = (x_{k 1}, x_{k 2}, \dots, x_{k D})$ denote the $k$ th condition, for $k \in {1, 2, \dots, K}$ , where $x_{k d}$ indicates the state of the $d$ th factor, and $θ_{k} = θ (x_{k})$ denote the parameter of interest associated the condition. We are concerned with the estimation of monotone response surface $θ = (θ_{1}, θ_{2}, \dots, θ_{K})$ , where the number $K$ is large in many applications. We assume without loss of generality that $θ$ is nondecreasing in $x$ in terms of the partial Euclidean ordering (≻): if $x_{k} ≻ x_{k^{'}}$ , then $θ_{k} \geq θ_{k^{'}}$ , where $x_{k} ≻ x_{k^{'}}$ denotes the event $x_{k d} \geq x_{k^{'} d}$ for each component $d$ with at least one strict inequality.

To motivate our work, consider a recent Delphi study where an expert panel identified important factors that influence the selection of postacute care for stroke patients (Stein et al., 2022), including four main factors: likelihood of benefitting from active rehabilitation (factor 1), need for clinicians with specialised rehabilitation skills (factor 2), need for ongoing medical and nursing care (factor 3), and patient’s ability to tolerate rehabilitation (factor 4); and three minor factors: family/caregiver support (factor 5), likelihood of return to community (factor 6), and ability to return to physical home (factor 7). While the presence of each of these factors increases the likelihood of referral to an inpatient rehabilitation program, we currently plan to conduct chart reviews to understand the combined effects of these factors and develop a clinical decision support tool. In each patient chart, a main factor will be scored as 0, 1, and 2 respectively for the answer “no”, “uncertain”, and “yes”, and a minor factor will be scored as 0 for “no/uncertain” and 1 for “yes”. The outcome will be noted as whether the patient was referred to rehabilitation. In summary, each condition consists of $D = 7$ factors with each taking on two or three possible values and there are a total of $K = 3^{4} \times 2^{3} = 648$ conditions. Unconstrained estimate of each $θ_{k}$ , the underlying referral probability associated with condition $k$ , can be obtained based on the number $m_{k}$ of patients under condition $k$ and the number $y_{k}$ of patients referred to rehabilitation among them, for $k = 1, \dots, 648$ . In the simple cases of estimating population parameters associated with different conditions, we will use $y_{k}$ , to generically denote the data associated with condition $k$ with distribution $f (y_{k} ∣ θ_{k}, ϑ_{k})$ where the nuisance parameter $ϑ_{k}$ may be shared across conditions. Notations for increasingly complicated model set up will be defined and explained for specific applications in Section 4.

There is a large literature on monotone or isotonic regression, which can be formulated as a restricted least squares optimisation problem and can be solved using the pool-adjacent-violators-algorithm (PAVA); see Brunk (1955), Ayer et al. (1955), Barlow et al. (1972), and Robertson et al. (1988). Numerous approaches have been proposed to deal with multivariate isotonic regression, including additive models (Bacchetti, 1989; Morton-Jones et al., 2000), spline methods using monotone basis functions (Ramsay, 1988; Leitenstorfer and Tutz, 2007), Bayesian mixture modeling (Bornkamp et al., 2010), and projective Gaussian process (Lin and Dunson, 2014). These methods deal with situations with continuous functions, where additional assumptions such as piecewise linearity, additivity, and smoothness are used to make computations tractable. A monotone response surface defined over a continuous support may also be estimated using tensor-product splines with monotone basis functions on the margins and appropriate constraints on the coefficients of the basis functions; see Wang and Taylor (2004) for an application that uses $B$ -spline bases. As such, computations can be formulated as a convex optimisation problem for which statistical packages such as the R package CVXR (Fu et al., 2020) can implement quite efficiently. Isotonic regression has also been recently studied for survival analysis. Chung et al. (2018) propose a pseudo-iterative convex minorant algorithm, with theoretical justifications, to implement PAVA and maximise the partial likelihood under isotonic proportional hazards models for right-censored data. While their approach exhibits computational stability under a piecewise constant assumption, the algorithm is applied to a single continuous covariate and focuses on point estimation. Generally, optimisation and inference of isotonic regression can become complex and challenging as the dimension $D$ increases, while most methods have been demonstrated in problems with relatively low dimension ( $D = 2$ to 4).

Our work is motivated by a number of considerations that render the above mentioned approaches not directly applicable. First, we focus on applications where the response surface is observed on discrete levels per factor with a moderate-to-high number $D$ of factors. While Wright (1982) studies maximum likelihood estimation of a univariate function observed on discrete levels, there is relatively little discussion on isotonic regression for multiple discrete factors. Second, we adopt a Bayesian decision-theoretic framework to deal with inference including interval estimation. For multivariate isotonic functions of discrete factors, a pragmatic Bayesian approach will first draw from the posterior distribution based on an unconstrained model and then include only draws that meet partial ordering; see Holmes and Heard (2003) for example. As will be illustrated, the constrained posterior thus obtained may cause bias and its feasibility is limited as $D$ increases. Third, to ensure broad applicability, we aim to develop a general approach that can work with different statistical and regression models, including generalised linear models, mixed effects models, Bayesian hierarchical models, and survival models.

To address the above considerations, in this article, we propose estimation of the response surface $θ$ by inverting an iterated sequence of partially ordered classifier ensembles, each of which is solved by projecting unconstrained Bayes classifiers onto the space constrained by partial ordering. The proposed classifier ensemble may be viewed as an extension of the product-of-independent-probability-escalation (PIPE) method by Mander and Sweeting (2015) for two-dimensional dose-response, and will thus be called PIPE-classifiers; and the estimator for $θ$ obtained by its inverse will be called iPIPE. Cheung et al. (2022) develop a decision-theoretic framework to motivate PIPE and outline the principle of extension to dimension $D > 2$ without examining its computational feasibility. In this article, we will propose efficient computation algorithms that solve PIPE-classifiers and iPIPE simultaneously and demonstrate its feasibility for high dimension problems.

2 |. METHODS

2.1 |. A partially ordered classifier ensemble

We first introduce a classification problem of condition $k$ with respect to some threshold $t$ . Define the classifier ensemble $γ (t) = (γ_{1} (t), γ_{2} (t), \dots, γ_{K} (t))$ where $γ_{k} (t) = I (θ_{k} > t)$ and $I (\cdot)$ is an indicator function. As a nondecreasing function in $θ_{k}, γ (t)$ is also nondecreasing in $x$ in terms of partial ordering. We let $Γ$ denote the constrained space where $γ (t)$ lives. More generally, define

Γ (𝒮) = \{g \in {0, 1}^{|𝒮|} : g_{k} \geq g_{k^{'}} for x_{k} ≻ x_{k^{'}}; k, k^{'} \in 𝒮\} for some 𝒮 \subseteq \{1, \dots, K\} .

(1)

Then $γ (t) \in Γ ({1, \dots, K}) ≔ Γ$ . Further let $γ_{𝒮} (t) = (γ_{k} (t) : k \in 𝒮)$ denote a subvector of $γ (t)$ on $𝒮$ and it is easy to see that $γ (t) \in Γ$ implies $γ_{𝒮} (t) \in Γ (𝒮)$ for any $𝒮$ .

We consider estimation by maximising the objective function $H_{𝒮} (g; t)$ defined as follows:

{\hat{γ}}_{𝒮} (t) = arg max_{g \in Γ (𝒮)} H_{𝒮} (g; t) ≔ arg max_{g \in Γ (𝒮)} \prod_{k \in 𝒮} {[{ϵ p_{k} (t)}^{g_{k}} {(1 - ϵ) (1 - p_{k} (t))}^{1 - g_{k}}]}^{w_{k}}

(2)

where $p_{k} (t) = E \{γ_{k} (t) ∣ y_{1}, \dots, y_{K}\}$ is the expectation of $γ_{k} (t)$ taken with respect to the posterior distribution of $θ$ (to be elaborated in Section 4), and the weight $w_{k} \geq 0$ is chosen to reflect the information content about condition $k$ and some $ϵ \in (0, 1)$ . For brevity, we suppress the dependence of $H_{𝒮} (g; t)$ on $ϵ$ in our notation. The subscript $𝒮$ will also be omitted when $𝒮 = {1, \dots, K}$ , e.g., writing $γ_{{1, \dots, K}} (t)$ as $γ (t)$ , $H_{{1, \dots, K}}$ as $H$ , etc.

Proposition 1. $H_{𝒮} (g; t)$ is a weighted product of posterior gains over $k \in 𝒮$ with respect to the gain function

h_{ϵ} (g; γ_{k} (t)) = ϵ g γ_{k} (t) + (1 - ϵ) (1 - g) (1 - γ_{k} (t)), for g = 0, 1 .

(3)

Proposition 1 can be easily verified by taking expectations of the right-hand side of (3) with respect to the posterior distribution of $θ$ for each $k$ . Under this framework, $ϵ$ may be viewed as a decision parameter that defines the relative gains of a true negative and a true position decision (Cheung et al., 2022), and may be used to control classification errors.

When we estimate an individual $γ_{k} (t)$ we can also show ${\hat{γ}}_{k}^{B} (t) = I \{p_{k} (t) > 1 - ϵ\}$ is a Bayes estimator for $γ_{k} (t)$ in that it maximises the gain function (3).

Proposition 2. If ${\hat{γ}}^{B} (t) = ({\hat{γ}}_{1}^{B} (t), \dots, {\hat{γ}}_{K}^{B} (t)) \in Γ$ , then $H (g; t)$ is maximised at $g = {\hat{γ}}^{B} (t)$ .

Proposition 2 can be verified by observing that ${\hat{γ}}_{k}^{B} (t)$ maximises the $k$ th factor in the product in (2) so that

H ({\hat{γ}}^{B} (t); t) \geq H (g; t) for all g \in {0, 1}^{K} .

(4)

Combining (2) and (4) and setting $𝒮 = {1, \dots, K}$ , we can write

\hat{γ} (t) = arg min_{g \in Γ} |H ({\hat{γ}}^{B} (t); t) - H (g; t)| .

(5)

As such, the classifier ensemble $\hat{γ} (t)$ may be viewed as a projection of the Bayes classifier ensemble on the constrained space $Γ$ . Proposition 2 implies that if ${\hat{γ}}_{k}^{B} (t)$ is evaluated under a joint posterior of $θ$ with support that satisfies partial ordering, the estimator $\hat{γ} (t)$ will be its own projection. To ensure ${\hat{γ}}^{B} (t) \in Γ$ , one could use parametric models such as linear additive models to impose monotonicity. Alternatively, motivated by ease and scalability of independently computing unconstrained $p_{k} (t)$ and keeping model assumptions to a minimum, we propose applying (2) in conjunction with unconstrained distribution of $θ$ . The estimator $\hat{γ} (t)$ thus resulted will be called a PIPE-classifier, as coined in Mander and Sweeting (2015) who introduce the special case of (2) with $ϵ = 0.5$ and $w_{k} \equiv 1$ for binomial outcomes over two-dimensional grid.

2.2 |. Inverting PIPE-classifiers (iPIPE)

In this subsection, we return to the estimation of $θ$ . Viewing the estimand $θ_{k}$ as an inverse of $γ_{k} (t)$ , we may write

θ_{k} = γ_{k}^{- 1} (0) = min \{t : γ_{k} (t^{'}) = 0 for all t^{'} \geq t\} = min \{t : γ_{k} (t) = 0\} .

(6)

The last equality in (6) holds because $γ_{k} (t)$ is nonincreasing in $t$ , and as such, its inverse is unambiguously defined. If a PIPE-classifier ${\hat{γ}}_{k} (t)$ is nonincreasing in $t$ , an estimator for $θ_{k}$ can be analogously defined as its inverse:

{\hat{θ}}_{k} = {\hat{γ}}_{k}^{- 1} (0) = min \{t : {\hat{γ}}_{k} (t^{'}) = 0 for all t^{'} \geq t\} = min \{t : {\hat{γ}}_{k} (t) = 0\} .

(7)

Lemma 1. Partition $Γ$ into $Γ_{i 0} = \{g \in Γ : g_{i} = 0\}$ and $Γ_{i 1} = \{g \in Γ : g_{i} = 1\}$ for a given $i \in {1, \dots, K}$ . Define

{\hat{γ}}_{i l} (t) = arg max_{g \in Γ_{i l}} H (g; t) for I = 0, 1 .

(8)

Then ${\hat{γ}}_{i 1} (t) ≽ {\hat{γ}}_{i 0} (t)$ . That is, ${\hat{γ}}_{i 1 k} (t) \geq {\hat{γ}}_{i 0 k} (t)$ for all $k \in {1, \dots, K}$ , where ${\hat{γ}}_{i l k} (t)$ is the $k$ th element of ${\hat{γ}}_{i l} (t)$ .

Theorem 1. Let ${\hat{γ}}_{k} (t)$ denote the $k$ th element of $\hat{γ} (t)$ defined in (2). If ${\hat{γ}}_{k} (t) = 0$ , then ${\hat{γ}}_{k} (t^{'}) = 0$ for $t^{'} > t$ .

Theorem 1 shows that ${\hat{γ}}_{k} (t)$ is nonincreasing in $t$ , and as a result, the inverse of a PIPE-classifier (7) is well-defined. In addition, the following result provides the basis of choosing $ϵ$ for point and interval estimation.

Proposition 3. If ${\hat{γ}}^{B} (t) \in Γ$ for all $t$ , then ${\hat{θ}}_{k}$ is equal to the $ϵ$ -quantile of the posterior distribution of $θ_{k}$ .

Generally, we do not expect ${\hat{γ}}^{B} (t) \in Γ$ unless in special cases such as when using parametric models as discussed after Proposition 2. Proposition 3 however provides an interpretation of $ϵ$ in the context of estimation. Specifically, we may consider ${\hat{θ}}_{k}$ with $ϵ = 0.5$ , i.e., posterior median, as a point estimate for $θ_{k}$ , and obtain a 95% credible interval by evaluating ${\hat{θ}}_{k}$ with $ϵ = 0.025, 0.975$ .

The proofs of Lemma 1, Theorem 1, and Proposition 3 are given in the Appendix.

3 |. COMPUTATION ALGORITHMS

3.1 |. A sweep algorithm

The main motivation for using the PIPE-classifier (2) is that each individual factor in the product can be easily and quickly computed without consideration of partial ordering; thus computations can be scaled to deal with problems with complex partial ordering structure as $D$ increases. When the dimension $D$ and number $K$ of conditions are small, one can evaluate the set $Γ$ by brute force, i.e., enumerating each $g \in {0, 1}^{K}$ and check if it belongs to $Γ$ . Given $Γ$ is determined, the additional computational cost of (2) is $K | Γ |$ . Enumerating the entire set $Γ$ , however, becomes infeasible quickly as $D$ and $K$ increase.

Suppose there exists $t_{L}$ such that ${\hat{γ}}_{k}^{B} (t) = 1$ for all $k$ and all $t \leq t_{L}$ , and $t_{U}$ such that ${\hat{γ}}_{k}^{B} (t) = 0$ for all $k$ and $t \geq t_{U}$ . Under this assumption, Proposition 2 implies that $\hat{γ} (t_{L}) = 1$ and $\hat{γ} (t_{U}) = 0$ because ${\hat{γ}}^{B} (t_{L}), {\hat{γ}}^{B} (t_{U}) \in Γ$ . Then the following sweep algorithm solves the inverse $\hat{θ}$ of PIPE-classifiers, or iPIPE, without the needs for enumerating $Γ$ :

Iterate $t$ from $t_{L}$ to $t_{U}$ .
For each $t$ ,
1. Identify subset: Let $𝒵_{t} = \{k : {\hat{γ}}_{k}^{B} (t) = 0\}$ and $ℒ (𝒵_{t}) = \{j : x_{j} ≺ x_{k}, \exists k \in 𝒵_{t}\}$ . Define $C_{t} = 𝒵_{t} \cup ℒ (𝒵_{t}) ∖ 𝒟_{0 t}$ , where $𝒟_{0 t}$ the index set for ${\hat{γ}}_{k} (t)$ that has been determined 0 and is initially a null set.
2. Maximise: Evaluate the PIPE-classifiers ${\hat{γ}}_{C_{t}} (t)$ and set $\hat{γ} (t)$ equal to ${\hat{γ}}_{C_{t}} (t)$ on the subset $k \in C_{t}$ . Set ${\hat{γ}}_{k^{'}} (t) = 1$ for remaining $k^{'} \in {\overline{C}}_{t} ∖ 𝒟_{0 t}$ .
3. Sweep zeros: For all $k$ with ${\hat{γ}}_{k} (t) = 0$ , set ${\hat{γ}}_{k} (t^{'}) = 0$ for all $t^{'} \in [t, t_{U}]$ and add these indices to $𝒟_{0 t}$ .
Stop when ${\hat{γ}}_{k} (t) = 0$ for all $k \in {1, \dots, K}$ , and evaluate ${\hat{θ}}_{k} = min \{t : {\hat{γ}}_{k} (t) = 0\}$ according to (7).

For the sweep algorithm to yield the true maximiser $\hat{γ} (t^{'})$ for all $t^{'}$ , it requires:

Proposition 4. Step 2b yields the true maximiser $\hat{γ} (t)$ of $H (g; t)$ for the given $t$ .

As a consequence of Proposition 4 and Theorem 1, sweeping zeroes across the remaining $t^{'}$ (step 2c) yields the correct ${\hat{y}}_{k} (t^{'})$ for all $k \in 𝒟_{0 t}$ .

The core idea of the algorithm is to break down the maximisation problem into maximisation over subsets $(C_{t})$ . In particular, since we start with a small value of $t$ , the set $𝒵_{t}$ and hence $C_{t}$ are small at the beginning. As the algorithm iterates across $t$ , the set $𝒟_{0 t}$ of determined zeroes increases thus limiting the size of $𝒟_{t}$ and rendering the maximisation step feasible. Specifically, the computational cost in step 2b for each $t$ is $|C_{t}| \times |Γ (C_{t})|$ provided that $C_{t}$ is determined.

Note that as an easy corollary to Theorem 1, if ${\hat{γ}}_{k} (t^{'}) = 1$ , then ${\hat{γ}}_{k} (t) = 1$ for $t < t^{'}$ . Thus, one can define an analogous algorithm that starts with $t = t_{U}$ and sweep ones in the opposite direction.

3.2 |. Numerical illustration: clinical decisions for rehabilitation

We illustrate the sweep algorithm using a simulated data set in the context of evaluating clinical decisions for referring stroke patients to rehabilitation described in Section 1. For brevity in presenting the results, we consider only the four main factors in this subsection, i.e., having $D = 4$ and $K = 3^{4} = 81$ . Even in this reduced problem, there are 2⁸¹ possible classifier ensembles without the partial ordering constraint and enumerating the partially ordered set $Γ$ from the unconstrained space will be computationally prohibitive.

In the simulated data set, for each condition, we first generated the number $m_{k}$ of patients with that condition and then generated the number $y_{k}$ of patients referred to rehabilitation given $m_{k}$ , i.e., $y_{k} ~ binomial (m_{k}, θ_{k})$ where $θ_{k}$ was the referral probability for condition $k$ . To analyse the simulated data, we postulated a uniform prior on $θ_{k}$ for all $k$ , so that the unconstrained posterior distribution was $θ_{k} ~ beta (y_{k} + 1, m_{k} - y_{k} + 1)$ based on which each $p_{k} (t)$ was to be evaluated.

While the simulated data along with the specifications of $m_{k}$ and $θ_{k}$ and the analysis results are provided in the web-based supporting materials, Table 1 gives some intermediate steps of the sweep algorithm applied to the simulated data with $ϵ = 0.5$ , described as follows:

We first determine the lower limit $t_{L} = 0.019$ as the largest threshold (to the third decimal place) so that ${\hat{γ}}_{k}^{B} (t_{L}) = 1$ for all $k$ . We then iterate $t$ on a grid with increments 0.001.
When $t = 0.020$ , the condition $x_{k} = (0, 1, 0, 1)$ is associated with ${\hat{γ}}_{k}^{B} (t_{L}) = 0$ , and thus belongs to $Z_{0.020}$ ; see first row under the column $t = 0.020$ in Table 1. By partial ordering, the conditions (0, 0, 0, 0), (0, 1, 0, 0), and (0, 0, 0, 1) are included in $ℒ (𝒵_{0.020})$ ; rows 2–4 in the table. Thus, the set $C_{0.020}$ consists of 4 conditions. Applying (2) on the subset $C_{0.020}$ gives ${\hat{γ}}_{k} (0.020) = 1$ for all $k \in C_{0.020}$ .
When $t = 0.029$ , the set $C_{0.029}$ remains the same, although the condition (0, 0, 0, 0) now belongs to $𝒵_{0.029}$ and its associated ${\hat{γ}}_{k} (0.029) = 0$ by maximising $H_{C_{0.029}}$ over $Γ (C_{0.029})$ . As a result, this condition is added to $𝒟_{0 t}$ and its associated ${\hat{γ}}_{k} (t^{'})$ would be set at 0 for all $t^{'} \geq 0.029$ .
When $t = 0.036$ , the set $C_{0.036}$ consists of 3 conditions, all of which are determined to have ${\hat{γ}}_{k} (0.036) = 0$ in the maximisation step, and are added to $𝒟_{0 t}$ .
Table 1 further gives the intermediate results for $t = 0.043, 0.046, 0.049, 0.053, 0.056$ to illustrate how the set $C_{t}$ changes over the iteration. Overall, while the number of conditions belonging to $𝒵_{t}$ and $ℒ (𝒵_{t})$ grows as $t$ grows, the set $𝒟_{0 t}$ also grows. The size of $C_{t}$ in this example ranges from 0 to 9 for all $t \in [t_{L}, t_{U}]$ . Thus, the maximisation step (step 2b) is feasible computationally.
Note that the table only shows conditions that belong to $C_{t} \cup 𝒟_{0 t}$ at a given $t$ . Because ${\hat{γ}}_{k^{'}} (t) = 1$ for $k^{'} \in {\overline{C}}_{t} \cap {\bar{𝒟}}_{0 t}$ , they are not shown in the table to conserve space.
The iteration stops at $t = t_{U} = 0.98$ when ${\hat{γ}}_{k} (0.98) = 0$ for all $k \in {1, \dots, 81}$ .

TABLE 1.

Illustration of the sweep algorithm applied to simulated data with outcomes $y_{k} ~ binomial (m_{k}, θ_{k})$ for condition $x_{k}$ . For each $t$ , conditions that belong to $C_{t}$ are shown (under the columns ‘set’) along with the PIPE-classifier ${\hat{γ}}_{k} (t)$ per step 2b.

$x_{k}$	$m_{k}$	$y_{k}$	$t = 0.020$		$t = 0.029$		$t = 0.036$		$t = 0.043$		$t = 0.046$
			set	${\hat{γ}}_{k} (t)$	set	${\hat{γ}}_{k} (t)$	set	${\hat{γ}}_{k} (t)$	set	${\hat{γ}}_{k} (t)$	set	${\hat{γ}}_{k} (t)$
(0,1,0,1)	34	0	$𝒵_{t}$	1	$𝒵_{t}$	1	$𝒵_{t}$	0	$𝒟_{0 t}$	0	$𝒟_{0 t}$	0
(0,0,0,0)	23	0	$ℒ (𝒵_{t})$	1	$𝒵_{t}$	0	$𝒟_{0 t}$	0	$𝒟_{0 t}$	0	$𝒟_{0 t}$	0
(0,1,0,0)	17	0	$ℒ (𝒵_{t})$	1	$ℒ (𝒵_{t})$	1	$ℒ (𝒵_{t})$	0	$𝒟_{0 t}$	0	$𝒟_{0 t}$	0
(0,0,0,1)	26	1	$ℒ (𝒵_{t})$	1	$ℒ (𝒵_{t})$	1	$ℒ (𝒵_{t})$	0	$𝒟_{0 t}$	0	$𝒟_{0 t}$	0
(0,0,1,2)	15	0							$𝒵_{t}$	1	$𝒵_{t}$	1
(0,0,1,0)	22	1							$ℒ (𝒵_{t})$	1	$ℒ (𝒵_{t})$	1
(0,0,1,1)	1	0							$ℒ (𝒵_{t})$	1	$ℒ (𝒵_{t})$	1
(0,0,0,2)	29	1							$ℒ (𝒵_{t})$	1	$ℒ (𝒵_{t})$	1
(1,0,0,0)	14	0									$𝒵_{t}$	0

$x_{k}$	$m_{k}$	$y_{k}$	$t = 0.049$		$t = 0.053$		$t = 0.056$
			set	${\hat{γ}}_{k} (t)$	set	${\hat{γ}}_{k} (t)$	set	${\hat{γ}}_{k} (t)$
(0,1,0,1)	34	0	$𝒟_{0 t}$	0	$𝒟_{0 t}$	0	$𝒟_{0 t}$	0
(0,0,0,0)	23	0	$𝒟_{0 t}$	0	$𝒟_{0 t}$	0	$𝒟_{0 t}$	0
(0,1,0,0)	17	0	$𝒟_{0 t}$	0	$𝒟_{0 t}$	0	$𝒟_{0 t}$	0
(0,0,0,1)	26	1	$𝒟_{0 t}$	0	$𝒟_{0 t}$	0	$𝒟_{0 t}$	0
(0,0,1,2)	15	0	$𝒵_{t}$	1	$𝒵_{t}$	1	$𝒵_{t}$	1
(0,0,1,0)	22	1	$ℒ (𝒵_{t})$	1	$ℒ (𝒵_{t})$	1	$ℒ (𝒵_{t})$	1
(0,0,1,1)	1	0	$ℒ (𝒵_{t})$	1	$ℒ (𝒵_{t})$	1	$ℒ (𝒵_{t})$	1
(0,0,0,2)	29	1	$ℒ (𝒵_{t})$	1	$ℒ (𝒵_{t})$	1	$𝒵_{t}$	0
(1,0,0,0)	14	0	$𝒟_{0 t}$	0	$𝒟_{0 t}$	0	$𝒟_{0 t}$	0
(0,2,0,1)	33	1	$𝒵_{t}$	1	$𝒵_{t}$	0	$𝒟_{0 t}$	0
(0,2,0,0)	5	0			$ℒ (𝒵_{t})$	0	$𝒟_{0 t}$	0

Open in a new tab

3.3 |. Sequential random subset maximisation

The feasibility of the sweep algorithm depends on the size of $C_{t}$ at each $t$ . The sweep algorithm occasionally could get stuck at particular $t$ when $C_{t}$ is large. For a problem with large $D$ and $K$ , evaluating $Γ (C_{t})$ may not be feasible in general. For these large $D$ and large $K$ situations, we propose a sequential random subset maximisation (SRSM) method ${\hat{γ}}_{SRSM} (t)$ to approximate ${\hat{γ}}_{C_{t}} (t)$ defined in the sweep algorithm. First determine a subset size $K_{sub} \leq |𝒟_{t}|$ :

Select random subset: Randomly select a subset $𝒮_{t} \subseteq {\bar{𝒟}}_{t}$ , where $|𝒮_{t}| \leq K_{sub}$ and ${\bar{𝒟}}_{t} \subseteq 𝒟_{t}$ is the set associated with undetermined ${\hat{γ}}_{k} (t)$ on $C_{t}$ and is initially set as $C_{t}$ .
Maximise: Evaluate ${\hat{γ}}_{S_{t}} (t)$ and set ${\hat{γ}}_{SRSM} (t)$ equal to ${\hat{γ}}_{S_{t}} (t)$ on the subset $k \in 𝒮_{t}$ .
Impose partial orders: Let $ℐ_{t} \subseteq C_{t}$ be the set of indices associated with ${\hat{γ}}_{I_{t}} (t)$ whose values are implied by ${\hat{γ}}_{S_{t}} (t)$ by partial ordering. Update ${\hat{γ}}_{SRSM} (t)$ on $ℐ_{t}$ accordingly.
Update ${\bar{𝒟}}_{t} \leftarrow {\bar{𝒟}}_{t} ∖ 𝒮_{t} \cup I_{t}$ , and repeat step 1 until ${\bar{𝒟}}_{t} = \emptyset$ .
Set ${\hat{γ}}_{C_{t}} (t) = {\hat{γ}}_{SRSM} (t)$ when the algorithm ends.

While there is no theoretical guarantee the maximisation step (step 2) in SRSM will yield the true ${\hat{γ}}_{C_{t}} (t)$ , we may repeat the algorithm many times and select the classifiers with maximum $H_{C_{t}} ({\hat{γ}}_{SRSM} (t); t)$ .

SRSM is intended to be applied in conjunction with step 2b of the sweep algorithm. However, the method can be a stand-alone algorithm for the evaluation of $\hat{γ} (t)$ , by replacing $C_{t}$ with ${1, \dots, K}$ in the algorithm. Table 2 summarises the performance of the SRSM algorithm when applied to the simulated data in Section 3.2 for directly evaluating the full vector $\hat{γ} (t)$ at $t = 0.5$ . In this case, as we know the ground truth from the sweep algorithm, the table records how frequent the SRSM is correct for different values of $K_{sub}$ . As expected, the algorithm is correct more often with a larger $K_{sub}$ . In reality where we do not have the ground truth, we require the algorithm only to be correct at least once (instead of most of the time). Therefore, as long as there is a non-trivial likelihood of getting the correct $\hat{γ} (t)$ on each SRSM run, the probability of getting the correct answer upon repeated runs will be very high. In our example, the likelihood is quite high (at 38%) even when a small $K_{sub} = 5$ subset is sampled out of the $K = 81$ possible conditions.

TABLE 2.

Performance of the SRSM algorithm for evaluating $\hat{γ} (0.5)$ in the simulated rehab data ( $D = 4, K = 81$ ) using different $K_{sub}$ . The SRSM algorithm is repeated 100 times for each $K_{sub}$ . The ground truth is known based on the sweep algorithm.

$K_{sub}$	5	7	9	11	12	13	14	15	20
Number of correct classification	38	45	46	46	51	59	57	57	58

Open in a new tab

To illustrate how SRSM works with the sweep algorithm for a large $D$ , we consider another simulated data set that include all 7 factors in a rehabilitation chart review, i.e., a total of 648 conditions. The data generation model is described in Section 5.3 below. At $t = 0.06$ in the sweep algorithm, the set $C_{0.05}$ consists of 25 conditions. First, to obtain the ground truth ${\hat{γ}}_{C_{0.05}} (0.05)$ , we enumerated the entire set $Γ (C_{0.05})$ , identified 67,929 partially ordered classifier ensembles, and evaluated ${\hat{γ}}_{C_{0.05}} (0.05)$ per (2). Next, we applied SRSM with $K_{sub} = 5$ out of $|C_{0.05}| = 25$ possible conditions and repeated the algorithm 100 times. Five of the 100 repetitions yielded the true ${\hat{γ}}_{C_{0.05}} (0.05)$ . With $K_{sub} = 7$ , the number of correct classifications increased to 14. The SRSM algorithm with $K_{sub} = 7$ and 100 repetitions took less than a minute to run on a local computer, while it took a few hours to enumerate $Γ (C_{0.05})$ on the same machine.

4 |. APPLICATIONS

4.1 |. Estimating population parameters under partial ordering

iPIPE is directly applicable to the estimation of population parameters associated with partially ordered conditions. Generally, let $y_{k} = (y_{k 1}, y_{k 2}, \dots, y_{k, m_{k}})$ denote a random sample of size $m_{k}$ from a population with parameter $θ_{k}$ and nuisance parameters $ϑ_{k}$ under condition $x_{k}$ , where $θ = (θ_{1}, \dots, θ_{K})$ is nondecreasing in $x$ in terms of partial ordering. Then the PIPE-classifier (2) and its inverse (7) can be evaluated with $p_{k} (t) = E (θ_{k} > t ∣ y_{k})$ with respect to the posterior marginal distribution of $θ_{k}$ .

To illustrate, consider normal data $y_{k}$ with mean $μ_{k}$ and variance $σ_{k}^{2}$ . Unconstrained Bayesian inference about $μ_{k}$ and $σ_{k}$ may be performed independently for each $k$ based on a semiconjugate prior: $μ_{k} ~ N (0, τ_{0}^{2})$ and $f (σ_{k}) \propto σ_{k}^{- 1}$ . The posterior distribution of $μ_{k}$ and $σ_{k}$ can then be simulated from

μ_{k} ∣ σ_{k}, {\overline{y}}_{k} ~ N ({\hat{μ}}_{k}, {\hat{τ}}_{k}^{2}) and f (σ_{k} ∣ y_{k}) \propto {\hat{τ}}_{k} ϕ ({\hat{μ}}_{k} / τ_{0}) σ_{k}^{- 1} ℒ_{k} ({\hat{μ}}_{k}, σ_{k})

(9)

where ${\overline{y}}_{k}$ is the sample mean of the random sample, the function $ϕ$ denotes standard normal density, $ℒ_{k}$ is the normal likelihood given $y_{k}$ , and

{\hat{μ}}_{k} = \frac{m_{k} / σ_{k}^{2}}{1 / τ_{0}^{2} + m_{k} / σ_{k}^{2}} {\overline{y}}_{k} and {\hat{τ}}_{k}^{2} = {(1 / τ_{0}^{2} + m_{k} / σ_{k}^{2})}^{- 1} .

Suppose that $μ_{k}$ ’s are nondecreasing in $x$ in terms of partial ordering. i.e., setting $θ_{k} = μ_{k}$ with nuisance parameters $ϑ = (σ_{1}, \dots, σ_{K})$ . Then we have

p_{k} (t) = \int Φ (\frac{{\hat{μ}}_{k} - t}{{\hat{τ}}_{k}}) f (σ_{k} ∣ y_{k}) d σ_{k},

(10)

where $Φ$ is the standard normal distribution function. Instead of evaluating (10) numerically, we can compute $p_{k} (t)$ , for each given $t$ , by drawing $μ_{k}$ according to the posterior distribution (9).

In another example, consider binomial variable $y_{k}$ with size $m_{k}$ and response probability $θ_{k}$ under condition $k$ . Postulating a conjugate beta $(a, b)$ prior on the probability parameter, we can draw unconstrained inference about $θ_{k}$ based on the posterior distribution $beta (a + y_{k}, b + m_{k} - y_{k})$ , i.e., $p_{k} (t) = 1 - B e (t; a + y_{k}, b + m_{k} - y_{k})$ , where $B e (\cdot; a, b)$ is the distribution function of beta $(a, b)$ . To illustrate in a real data application, we consider data in 5,087 users in an app recommender study (Cheung et al., 2018). In this analysis, each user received app recommendations on a weekly basis and their usage was tracked over a four-week period. Specifically, in each of the four weeks, a user would receive $x_{k d} \in {0, 1, 2}$ app recommendations for $d = 1, 2, 3, 4$ . For illustration purposes, we consider a dichotomised response, defined as engaging an app in the system for at least three times during the week following the four-week period. The left panel of Figure 1 summarises the recommendation patterns $(x_{k})$ and the app use data $(m_{k}, y_{k})$ for each condition: In the 5,087 users, we observe a total $K = 29$ patterns.

The unconstrained estimates are obtained with uniform prior on each $θ_{k}$ (i.e., $a = b = 1$ ) and are plotted in the right panel of Figure 1, along with the iPIPE estimates (medians and 95% credible intervals) using the sweep algorithm. The impact of the partial ordering assumption is clearly demonstrated as the iPIPE estimates differ from the unconstrained estimates in two ways. First, iPIPE depicts which conditions are not effective with very low estimated response rates, whereas the unconstrained estimates are more variable across conditions. Second, the 95% credible intervals based on iPIPE are much narrower than those based on unconstrained estimation, particularly when $m_{k}$ is small.

4.2 |. Regression models

This subsection describes applications of iPIPE in regression models that include covariates or confounding variables as well as the multi-factor condition as predictor variables, for different outcome types. First consider linear regression for normal data:

y_{i} = \sum_{K}^{k = 1} α_{k} I ({condition}_{i} = x_{k}) + φ^{T} z_{i} + e_{i},

(11)

where $y_{i}$ and ${condition}_{i}$ respectively denote the response and the condition of subject $i$ with covariates $z_{i}$ and normal noise $e_{i}$ with variance $σ^{2}$ for $i = 1, \dots, n$ . Unconstrained Bayesian inference can be performed with the standard non-informative prior, which is uniform on $(α, φ, log σ)$ , or equivalently, $f (α, φ, σ^{2}) \propto σ^{- 2}$ (Gelman et al., 1995). However, when $K$ is large and the numbers of observations for some conditions are small, a proper prior on $α_{k}$ ’s may be used instead. Generally, it is easy to draw from the posterior distribution with respect to other prior distributions such as normal using standard software such as Rstan (Stan Development Team, 2021); see Section 5.1 for an example.

Under model (11), the effects of the conditions are expressed in terms of the intercepts $α_{k}$ ’s. If these intercepts are partially ordered according to the $K$ conditions, we can set $θ_{k} = α_{k}$ and the response surface $θ$ can be estimated by $\hat{θ}$ with $p_{k} (t) = E \{I (α_{k} > t) ∣ y_{1}, \dots, y_{n}\}$ for $k = 1, \dots, K$ , where the expectation is taken with respect to the unconstrained posterior distribution of $α_{k}$ . The nuisance parameters in this application are $ϑ = (φ, σ^{2})$ .

In some applications, one of the conditions (say $x_{1}$ ) is a control condition and the interest is in estimating the effect due to $x_{k}$ relative to $x_{1}$ , i.e., $α_{k} - α_{1}$ . As such, we may impose no constraint between $α_{1}$ and other $α_{k}$ ’s; rather, partial ordering is applied to $θ_{k} = α_{k} - α_{1}$ for $k = 2, \dots, K$ . That is, the response surface $θ = (θ_{2}, \dots, θ_{K})$ of interest has $K - 1$ parameters and can be estimated using iPIPE with $p_{k} (t) = E \{I (α_{k} - α_{1} > t) ∣ y_{1}, \dots, y_{n}\}$ for $k = 2, \dots, K$ , where the expectation is taken with respect to the unconstrained posterior distribution of $α_{k} - α_{1}$ . This illustrates how iPIPE can be applied to address constraints on different contrasts in model (11) in different applications, once the unconstrained posterior of $α_{k}$ ’s is obtained.

Generalised linear models are commonly used for regression analysis of non-normal response data, including logistic and probit regression for binary data, Poisson regression for count data, gamma regression for nonnegative continuous data, as well as linear regression. The expected response $E (y_{i})$ is $η$ -linear in the regression coefficients, i.e.,

η {E (y_{i})} ≔ η_{i} = \sum_{k = 1}^{K} α_{k} I ({condition}_{i} = x_{k}) + φ^{T} z_{i}

(12)

where $η$ is a known link function (McCullagh and Nelder, 1989). There is a large literature on Bayesian analysis of the generalised linear models. Ibrahim and Laud (1991), for example, give an in-depth discussion on the use of Jeffrey’s prior. Due to the large number $K$ of conditions, however, proper priors such as normal may be used for $α_{k}$ ’s; see Dellaportas and Smith (1993) who discuss an application using Gibbs sampling to compute the posterior. The response surface $θ$ is to be defined in terms of $α_{k}$ ’s according to the application and estimated using iPIPE in an analogous way as in the linear model described above.

For right-censored survival data, Cox’s model assumes that the hazard function at time $s$ is given by $λ_{0} (s) exp (η_{i})$ where $λ_{0} (s)$ is the baseline hazard and the conditions and covariates $z_{i}$ have multiplicative effects on the hazards via $η_{i}$ defined analogously to (12). While the focus of inference is often on the regression coefficients $(α, φ)$ , many Bayesian approaches to handle $λ_{0} (s)$ have been considered, including parametric models (Dellaportas and Smith, 1993), nonparametric prior process such as the gamma process (Burridge, 1981; Sinha and Dey, 1997), and avoiding modeling $λ_{0} (s)$ via the use of partial likelihood (Sinha et al., 2003). For example, the partial likelihood under proportional hazards is free of the nuisance $λ_{0}$ and can be expressed

ℒ (α, φ) = \prod_{i = 1}^{n} {\frac{exp (η_{i})}{\sum_{j \in ℛ (y_{i})} exp (η_{i})}}^{δ_{i}} = \prod_{i = 1}^{n} {[\frac{exp {\sum_{k = 1}^{K} α_{k} I ({condition}_{i} = x_{k}) + φ^{T} z_{i}}}{\sum_{j \in ℛ (y_{i})} exp {\sum_{k = 1}^{K} α_{k} I ({condition}_{j} = x_{k}) + φ^{T} z_{j}}}]}^{δ_{i}}

(13)

= \prod_{i = 1}^{n} {[\frac{exp {\sum_{k = 2}^{K} (α_{k} - α_{1}) I ({condition}_{i} = x_{k}) + φ^{T} z_{i}}}{\sum_{j \in ℛ (y_{i})} exp {\sum_{k = 2}^{K} (α_{k} - α_{1}) I ({condition}_{j} = x_{k}) + φ^{T} z_{j}}}]}^{δ_{i}}

(14)

where $y_{i}$ is the minimum of censoring time and survival time of subject $i, δ_{i}$ is the indicator of observing the survival time, and $ℛ (s)$ is the risk set at time $s$ . Note that when $λ_{0}$ is unspecified, and assuming that ${condition}_{i} \in \{x_{1}, \dots, x_{K}\}$ such that $\sum_{k = 1}^{K} I ({condition}_{i} = x_{k}) = 1$ , the right-hand-side of expression (13) is over-parameterised as the term $α_{1}$ can be absorbed into the baseline hazard function. Unconstrained Bayesian inference about $θ_{k} = α_{k} - α_{1} (and φ)$ can be based on partial likelihood (14) and the corresponding approximate posterior density $\propto ℒ (α, φ) f (α, φ)$ , which is a limiting marginal posterior of $(α, φ)$ under a fully Bayesian approach with a diffuse gamma process prior on $λ_{0}$ (Sinha et al., 2003). If the response surface $θ$ thus defined is assumed nondecreasing in $x$ in terms of partial ordering, it can be estimated using iPIPE based on the unconstrained posterior draws of $α_{k}$ ’s with $p_{k} (t) = E \{I (α_{k} - α_{1} > t) ∣ y_{1}, \dots, y_{n}\}$ for $k = 2, \dots, K$ in the same way as in the linear and generalised linear models. Note that if, in some applications, the response surface of interest is $θ_{k} ≔ α_{k}$ , a parametric form of $λ_{0}$ is needed to ensure identifiability; see for example Dellaportas and Smith (1993) who propose Gibbs sampling for proportional hazards model for Weibull survival time, where the full conditionals are simulated using adaptive rejection sampling (Gilks and Wild, 1992).

4.3 |. Covariate-dependent response surface

Model (11) can be extended to accommodate situations where the multi-factor condition interacts with the covariates, i.e., having

y_{i} = \sum_{k = 1}^{K} α_{k} I ({condition}_{i} = x_{k}) + φ^{T} z_{i} + \sum_{k = 1}^{K} I ({condition}_{i} = x_{k}) β_{k}^{T} z_{i} + e_{i}, for i = 1, \dots, n .

(15)

For the coefficients of the interaction terms, we may postulate $β_{1}, \dots, β_{K} ~ N (0, σ_{B}^{2} I)$ a priori, independently of the prior specified for the main effects $α_{k}$ ’s and $φ$ and the variance term $σ$ for model (11). A relatively informative prior, i.e., a small $σ_{B}$ , corresponds to the assumption that the effects of the $K$ conditions follow exactly the same order under all $z_{i}$ . In the special case when $β_{k} \equiv 0$ (i.e., a degenerative prior), the response surface $θ_{k} (z_{i}) = E (y_{i} ∣ I ({condition}_{i} = x_{k}, z_{i})$ will follow the same full order of $α_{k}$ ’s for each $z_{i}$ . Generally, the full order of a covariate-dependent response surface

θ (z_{i}) = (α_{1} + {(φ + β_{1})}^{T} z_{i}, α_{2} + {(φ + β_{2})}^{T} z_{i}, \dots, α_{K} + {(φ + β_{K})}^{T} z_{i}),

(16)

varies depending on $z_{i}$ even if it is subject to the same partial ordering constraint with respect to the condition $x$ . The response surface (16) can be estimated using iPIPE, for each given $z_{i}$ , with $p_{k} (t) = E (α_{k} + {(φ + β_{k})}^{T} z_{i} > t ∣ y)$ where the expectation is taken with respect to the unconstrained posterior distribution of $(α_{k}, φ, β_{k})$ . Applications with the generalised linear models and Cox’s model are analogous.

4.4 |. Hierarchical models for repeated measurements

In situations where an individual has repeated observations under different conditions, one may estimate the individual response surface using hierarchical models. Let $y_{i j}$ be the $j$ th measurement of individual $i$ under ${condition}_{i j}$ . An outcome model for these individuals can be expressed as

y_{i j} = \sum_{k = 1}^{K} α_{i k} I ({condition}_{i j} = x_{k}) + e_{i j}

(17)

where the individual effects $α_{i k}$ ’s of condition $x_{k}$ can be viewed as random effects that are potentially dependent with each other via underlying risk factors or confounding variables $z_{i}$ , and are possibly correlated with noise $e_{i j} ~ N (0, σ^{2})$ . As (17) is quite general, we illustrate in a concrete example where we examined the individual effects of sedentary breaks on cognitive performance in $n = 11$ participants. A sedentary break condition was defined by 2 factors over an 8-hour period: break frequency $x_{k 1}$ and duration $x_{k 2}$ . In addition to a control condition with no sedentary breaks denoted as $x_{1} = (x_{11}, x_{12}) = (0, 0)$ , each factor had two levels in the experiments:

low frequency ( $x_{k 1} = 1$ ; a break every 60 minutes) vs high frequency ( $x_{k 1} = 2$ ; a break every 30 minutes);
low break duration ( $x_{k 2} = 1$ ; 1 minute per break) vs high duration ( $x_{k 2} = 2$ ; 5 minutes per break).

Thus, each participant would be evaluated under $K = 5$ conditions on 5 different days with a 4–14 days washout period between conditions; see Table 3 for the list of condition $x_{k}$ for $k = 2, 3, 4, 5$ . We were interested in estimating the effects of sedentary break relative to the control condition in terms of the change in the Symbol Digit Modalities Test (SDMT) over an 8-hour period. While it was reasonable to assume that change in SDMT increases with each factor for each individual, we did not impose constraints between the control $x_{1}$ and other condition, as we were interested in estimating $θ_{i k} ≔ α_{i k} - α_{i 1}$ for $k = 2, 3, 4, 5$ for each $i$ .

TABLE 3.

Estimation of the population-level response surface $θ (z_{i})$ using the sedentary break data in 11 individuals: 6 men $(z_{i} = 0)$ and 5 women $(z_{i} = 1); m_{k}$ is the number of observations available for a given condition $k$ . For the unconstrained posterior and the constrained posterior, median (‘med’) of the posterior draws of $θ_{k}$ are reported along with the 0.025 and 0.975 posterior quantiles (‘95% int’). For iPIPE, the respective quantities are obtained by setting $ϵ = 0.5, 0.025, 0.975$ .

(a) Dose-response analysis results for men $(z_{i} = 0)$
Condition $k$	$(x_{k 1}, x_{k 2})$	$m_{k}$	Unconstrained		iPIPE, ${\hat{θ}}_{k}$		Constrained posterior
			med	95% int	med	95% int	med	95% int
2	(1,1)	6	2.85	(−2.37, 10.1)	0.94	(−4.53, 7.57)	−0.37	(−5.68, 4.84)
3	(1,2)	6	−0.12	(−5.33, 6.67)	0.94	(−4.53, 7.57)	1.94	(−3.26, 7.50)
4	(2,1)	6	−0.32	(−5.46, 6.54)	0.94	(−4.53, 7.57)	1.98	(−3.17, 7.43)
5	(2,2)	5	5.74	(−0.17, 12.4)	5.74	(−0.17, 12.4)	6.86	(1.22, 14.1)
(b) Dose-response analysis results for women $(z_{i} = 1)$
Condition $k$	$(x_{k 1}, x_{k 2})$	$m_{k}$	Unconstrained		iPIPE, ${\hat{θ}}_{k}$		Constrained posterior
			med	95% int	med	95% int	med	95% int
2	(1,1)	5	2.40	(−3.35, 9.73)	1.85	(−3.83, 9.74)	1.04	(−2.52, 2.15)
3	(1,2)	4	5.60	(−0.58, 12.82)	2.87	(−3.27, 10.7)	2.74	(1.10, 5.80)
4	(2,1)	5	1.56	(−4.48, 9.38)	1.85	(−3.83, 9.74)	1.33	(0.10, 4.45)
5	(2,2)	3	0.44	(−8.22, 7.01)	2.87	(−3.27, 10.7)	2.77	(2.74, 8.03)

Open in a new tab

We thus rewrite (17) as

y_{i j} = α_{i 1} + \sum_{k = 2}^{K} θ_{i k} I ({condition}_{i j} = x_{k}) + e_{i j}

(18)

and postulate that each $θ_{i}$ is nondecreasing in $x$ in terms of partial ordering. Under the parameterisation (18), $α_{i 1}$ indicates the mean response of participant $i$ under the control condition. For the individual-level parameters, we postulate a priori that:

$α_{11}, \dots, α_{n 1} ~ N (μ_{A}, σ_{A}^{2})$ ;
$θ_{i k} ~ N (ψ_{k} + ξ_{k} z_{i}, σ_{B}^{2})$ for $k = 2, 3, 4, 5$ for each individual $i$ , where $z_{i}$ is the gender of individual $i$ .

That is, the condition effects accounts for the gender in addition to the variability among individuals in the population. Further, the population-level parameters, namely $μ_{A}, σ_{A}, \{(ψ_{k}, ξ_{k}) : k = 2, 3, 4, 5\}, σ_{B}$ , and $σ$ , has the following prior distributions:

$μ_{A}$ has an improper flat prior, i.e., $f (μ_{A}) \propto 1$ ;
$ψ_{2}, ψ_{3}, ψ_{4}, ψ_{5} ~ N (0, 1000)$ ;
$ξ_{2}, ξ_{3}, ξ_{4}, ξ_{5} ~ N (0, 1000)$ ;
all variance parameters $(σ_{A}^{2}, σ_{B}^{2}, σ^{2})$ follow an inverse chi-squared distribution with 1 degree of freedom.

The “layer” of the population-level parameters in the hierarchical model facilitates pooling data from across individuals, but does not take advantage of partial ordering. To estimate a monotone individual response surface $θ_{i}$ , one could evaluate ${\hat{θ}}_{i}$ using iPIPE according to the PIPE-classifier ensemble (2) and its inverse (7) with $p_{i k} (t) = E \{I (θ_{i k} > t) ∣ y\}$ , where the expectation is taken with respect to the posterior of $θ_{i k}$ , for $k = 2, \dots, K$ .

In addition, we can make constrained inference about the population-level parameters using iPIPE. Specifically, we may define the population-level response surface

θ (z_{i}) = (θ_{2} (z_{i}), θ_{3} (z_{i}), θ_{4} (z_{i}), θ_{5} (z_{i})) ≔ (ψ_{2} + ξ_{2} z_{i}, ψ_{3} + ξ_{3} z_{i}, ψ_{4} + ξ_{4} z_{i}, ψ_{5} + ξ_{5} z_{i})

which is dependent on the covariate $z_{i}$ and is subject to the same partial ordering constraints as the individual $θ_{i k}$ ’s. Then, for each given $z_{i}$ , the response surface $θ (z_{i})$ can be estimated using iPIPE with $p_{k} (t) = E \{I (ψ_{k} + ξ_{k} z_{i} > t) ∣ y\}$ for $k = 2, \dots, K$ where the expectation is defined with respect to the unconstrained posterior of $(ψ_{k}, ξ_{k})$ .

The sedentary break data was fitted using model (18) with the above hierarchical structure using RStan with 4 chains each having 20,000 iterations after 5,000 warmup samples, and iPIPE was applied with $ϵ = 0.5$ (point estimate), 0.025, 0.975 (95% credible interval), and $w_{k} = m_{k}$ (number of observations for condition $x_{k}$ across individuals). As a comparison method, we also considered the constrained posterior that only included $θ_{i k}$ from the RStan samples that met partial ordering. Similar analyses were conducted for the population-level $θ_{k} (z_{i})$ ’s.

The analysis results for population-level response surface $θ (z_{i})$ are given in Table 3. When the unconstrained estimates do not violate any partial ordering constraint, the iPIPE estimates are similar to the unconstrained estimates; e.g., condition 4 in Table 3(a). When the unconstrained estimates violate partial ordering, iPIPE pools data from across conditions and equalises the estimates; e.g., in Table 3(b), iPIPE results in ${\hat{θ}}_{1} = {\hat{θ}}_{3}$ for conditions 1 and 3. In this regard, iPIPE behaves similarly to PAVA. In contrast, estimation based on the naive constrained posterior reverses the order of the unconstrained estimates for conditions 1 and 3. Similarly, the constrained posterior leads to significant conclusions (i.e., 95% credible interval excluding 0) for conditions 3 and 4 in Table 3(b), which could be an artifact of the constrained sampling scheme rather than the data.

Figure 2 displays the individual response surface estimates of 11 participants using the unconstrained posterior, iPIPE, and the constrained posterior. By imposing partial ordering, iPIPE (Figure 2(b)) reduces between-individual variability and produces estimates in a range similar to the unconstrained fits (Figure 2(a)). In contrast, the constrained posterior seems to lead to artificially exaggerated dose-response relationships at individual levels; e.g., see estimates for individual “f”.

5 |. SIMULATION STUDY

5.1 |. Simulation setting 1: $D = 2$ with equal sample size

In this section, we evaluate the performance of iPIPE when compared to other methods using simulation. The first simulation study examines situations with small $D$ and $K$ , namely, $D = 2$ with $x_{k 1} \in {1, 2, 3}$ and $x_{k 2} \in {1, 2}$ and $K = 6$ conditions. Each condition has equal number of $m_{k} = 5$ independent observations, generated from normal with mean $θ_{k}$ and a common standard deviation $σ = 2$ , where $θ_{k} = β_{1} \sqrt{x_{k 1}} + β_{2} x_{k 2} + β_{12} log (x_{k 1}) x_{k 2}$ . Four scenarios of $β s$ are considered in the simulation and are given in Table 4.

TABLE 4.

Simulation setting 1: $D = 2, K = 6$ with equal sample size

(a) Scenario 1: $β_{1} = 1, β_{2} = 2, β_{12} = 0$
Condition $k$		True	Unconstrained			iPIPE, ${\hat{θ}}_{k}$			Constrained posterior			PAVA
$x_{k 1}$	$x_{k 2}$	$θ_{k}$	bias	mse	cov	bias	mse	cov	bias	mse	cov	bias	mse
1	1	3.0	−0.07	0.79	0.96	−0.26	0.60	0.98	−0.90	1.17	0.77	−0.27	0.60
2	1	3.4	0.01	0.82	0.96	−0.01	0.43	0.99	−0.12	0.30	0.98	−0.01	0.43
3	1	3.7	0.04	0.84	0.96	0.23	0.60	0.98	0.67	0.81	0.86	0.23	0.59
1	2	5.0	−0.10	0.75	0.95	−0.27	0.53	0.98	−0.69	0.75	0.87	−0.27	0.54
2	2	5.4	−0.02	0.81	0.96	−0.04	0.41	0.99	0.07	0.24	0.98	−0.05	0.41
3	2	5.7	0.01	0.83	0.95	0.24	0.60	0.98	0.93	1.19	0.79	0.23	0.59
(b) Scenario 2: $β_{1} = 1, β_{2} = 2, β_{12} = - 0.25$
Condition $k$		True	Unconstrained			iPIPE, ${\hat{θ}}_{k}$			Constrained posterior			PAVA
$x_{k 1}$	$x_{k 2}$	$θ_{k}$	bias	mse	cov	bias	mse	cov	bias	mse	cov	bias	mse
1	1	3.0	−0.05	0.80	0.95	−0.29	0.57	0.97	−1.00	1.30	0.76	−0.29	0.57
2	1	3.2	−0.01	0.77	0.96	−0.01	0.36	0.99	−0.11	0.20	0.99	−0.01	0.36
3	1	3.5	0.03	0.77	0.96	0.26	0.52	0.99	0.67	0.81	0.85	0.26	0.51
1	2	5.0	−0.05	0.82	0.95	−0.33	0.61	0.97	−0.93	1.11	0.77	−0.34	0.61
2	2	5.1	−0.03	0.79	0.96	−0.01	0.34	0.99	0.07	0.20	0.99	−0.01	0.34
3	2	5.2	0.01	0.78	0.97	0.31	0.54	0.98	1.06	1.39	0.66	0.30	0.54
(c) Scenario 3: $β_{1} = 0, β_{2} = 2, β_{12} = 0.20$
Condition $k$		True	Unconstrained			iPIPE, ${\hat{θ}}_{k}$			Constrained posterior			PAVA
$x_{k 1}$	$x_{k 2}$	$θ_{k}$	bias	mse	cov	bias	mse	cov	bias	mse	cov	bias	mse
1	1	2.0	−0.10	0.81	0.95	−0.36	0.62	0.97	−1.08	1.46	0.71	−0.37	0.62
2	1	2.1	−0.02	0.85	0.95	−0.05	0.36	0.99	−0.12	0.24	0.98	−0.05	0.36
3	1	2.2	0.00	0.87	0.95	0.31	0.59	0.98	0.87	1.09	0.81	0.31	0.59
1	2	4.0	−0.05	0.78	0.97	−0.27	0.59	0.98	−0.74	0.93	0.85	−0.27	0.59
2	2	4.3	−0.01	0.76	0.96	−0.02	0.36	0.99	0.10	0.28	0.99	−0.02	0.36
3	2	4.4	0.06	0.75	0.96	0.30	0.55	0.97	1.03	1.35	0.71	0.30	0.54
(d) Scenario 4: $β_{1} = 1, β_{2} = 0, β_{12} = 0.50$
Condition $k$		True	Unconstrained			iPIPE, ${\hat{θ}}_{k}$			Constrained posterior			PAVA
$x_{k 1}$	$x_{k 2}$	$θ_{k}$	bias	mse	cov	bias	mse	cov	bias	mse	cov	bias	mse
1	1	1.0	0.02	0.78	0.95	−0.30	0.56	0.97	−1.13	1.58	0.65	−0.30	0.56
2	1	1.8	−0.06	0.82	0.96	−0.21	0.42	0.99	−0.58	0.50	0.95	−0.22	0.42
3	1	2.3	0.05	0.80	0.96	0.05	0.41	0.99	0.19	0.24	0.98	0.05	0.42
1	2	1.0	−0.01	0.74	0.96	0.19	0.44	0.99	0.14	0.28	0.99	0.18	0.44
2	2	2.1	0.03	0.84	0.95	0.11	0.43	0.99	0.37	0.33	0.95	0.11	0.43
3	2	2.8	−0.07	0.66	0.98	0.16	0.47	0.99	0.99	1.22	0.71	0.16	0.47

Open in a new tab

mse, mean squared error; cov, coverage probability.

Unconstrained estimates of $θ_{k}$ are obtained by fitting a Bayesian linear model with $θ_{1}, \dots, θ_{6} ~ N (0, 1000)$ and $σ^{2}$ following an inverse chi-squared distribution with 1 degree of freedom. We ran 4 chains in RStan each having 1,250 iterations after discarding 1,000 warmup samples (i.e., having 5,000 samples from the unconstrained posterior). The iPIPE point estimates are then obtained with $ϵ = 0.5$ and interval estimates with $ϵ = 0.025$ and 0.975. We also evaluate the constrained posterior estimates that include only posterior draws (out of 5,000 total) that meet partial ordering. In addition, we consider the frequentist PAVA as a comparison method.

The estimation properties of these methods are compared in Table 4. Overall, iPIPE consistently yields smaller mean squared error (mse) than the unconstrained estimates; the efficiency gain in terms of mse is quite substantial and results in reduction greater than 50% under some scenarios. The bias of iPIPE is small relative to mse. In contrast, the constrained posterior median can yield large bias, which in turn result in a large mse. Additionally, even though it is feasible to evaluate the constrained posterior estimates in this simulation because of low dimension, the proportions of draws that meet partial ordering are low, with mean ranging from 4% to 6% in the scenarios considered.

Finally, it is noteworthy that the estimation properties (bias and mse) of iPIPE are very similar to PAVA, while the former naturally produces interval estimation for inference. The simulation shows that 95% credible intervals of iPIPE achieve nominal (if conservative) coverage probability, whereas biases of the constrained posterior estimates lead to lower-than-nominal coverage probability.

5.2 |. Simulation setting 2: $D = 4$ with uneven sample sizes

The second simulation study examines iPIPE when $D = 4$ with $x_{k d} \in {1, 2}$ and $K = 16$ conditions with unequal sample sizes $(m_{k})$ . This represents situations with sparse sampling of some conditions. A binomial outcome $y_{k}$ was generated with size $m_{k}$ and probability $θ_{k}$ for condition $k$ in each simulation replicate. Unconstrained estimates of $θ_{k}$ are obtained as the median of the beta posterior assuming a uniform prior. Correspondingly, iPIPE estimates are obtained with $ϵ = 0.5$ and $w_{k} = m_{k}$ . In this case, it is infeasible to evaluate constrained posterior by using only draws that meet partial ordering, because the partial ordering constraints are restrictive and that acceptance rate is extremely small. Similarly, it is not straightforward to implement PAVA in this setting.

Figure 3 plots the distributions of the unconstrained and iPIPE point estimates under a given $θ$ based on 1,000 simulation replicates. Overall, the iPIPE estimates exhibit smaller variability than the unconstrained estimates, without inducing noticeable bias. Reduction in variability is pronounced when $m_{k}$ is small; for example, $x_{k} = (2, 2, 2, 2)$ and = (1, 1, 1, 2) when $m_{k} = 5$ . Additional simulation scenarios (given in web-based supporting material) confirm similar observations.

In addition to point estimates, 95% credible intervals are obtained for the unconstrained method and iPIPE. The coverage probabilities for all 16 conditions range 0.96 to 0.99 for iPIPE, and 0.94 to 0.98 for unconstrained. While the iPIPE estimates appear to be more conservative, it has narrower intervals on average: average width over all 16 conditions is 0.25 for iPIPE, compared to 0.30 for the unconstrained method.

5.3 |. Simulation setting 3: $D = 7$ with uneven sample sizes

In this subsection, we conduct simulation for settings with $D = 7$ and $K = 648$ with binomial outcomes as described in Section 1. In the simulated data, for condition $k$ , we first generate $m_{k} = [34 u_{k}] + 1$ where $u_{k}$ ’s are independent uniform(0,1) and keep $m_{k}$ fixed across simulation replicates. In each replicate, we draw $y_{k} ~ binomial (m_{k}, θ_{k})$ where

logit (θ_{k}) = - 5 + 1.5 {(\sum_{d = 1}^{7} x_{k d})}^{0.5} + 0.75 x_{k 1}^{2} + 0.5 x_{k 2} x_{k 3} .

(19)

We analyse each simulated data set using iPIPE implemented by the sweep/SRSM algorithm with $K_{sub} = 5$ and 100 repetitions, as well as unconstrained posterior estimate, with $θ_{k} ~ uniform (0, 1)$ a priori. Figure 4 shows the results based on 100 simulated data sets. Both methods have similar magnitude of bias. However, the unconstrained median demonstrates a clear trend of positive bias when the true $θ_{k}$ is small and negative bias when the true $θ_{k}$ is large, when $m_{k}$ is small so that the uniform prior has large influence. In contrast, biases of iPIPE are much attenuated for conditions with small $m_{k}$ . This translates into smaller mse by iPIPE, noticeably when $m_{k}$ is small.

Figure 4. — Simulation setting 3: $D = 7, K = 648$ with unequal sample sizes and binomial outcomes. Each circle in the figures represent a condition and the size of the circle is proportional to $m_{k}$ .

The respective average coverage probabilities for the unconstrained method and iPIPE are 0.95 and 0.99. While iPIPE appears to be conservative, it achieves the nominal level with a higher precision (average interval width 0.33) than the unconstrained method (average interval width 0.38).

5.4 |. Simulation setting 4: The effect of dimension $D$

Finally, we consider settings with conditions defined by $D$ binary factors, i.e., $x_{k d} \in 1, 2$ for each $d = 1, \dots, D$ , where $D = 10, 11, 12$ , with total sample size $N = 5000$ or 10000. An objective of this simulation study is to examine the impact of $D$ on the performance of iPIPE. There are $2^{D}$ possible conditions for a given $D$ . We first generate $m_{k}$ ’s from a multinomial distribution with size $N$ and probability $1 / 2^{D}$ for each $k$ ; and keep $m_{k}$ ’s fixed across simulation replicates. In each replicate, we generate a random sample of size $m_{k}$ under condition $k$ from an exponential distribution with rate $θ_{k}$ , i.e., $f (y_{k j} ∣ θ_{k}) = θ_{k} e^{- θ_{k} y_{k j}}$ for $y_{j k} > 0$ and $j = 1, \dots, m_{k}$ , where

log θ_{k} = - 2.5 + \sum_{d = 1}^{D} \frac{x_{k d}}{2^{d + 1}} + \frac{x_{k 1}}{2 D} \sum_{d = 1}^{D} \frac{x_{k d}}{d} + \frac{x_{k 2}}{2 \sqrt{2 D}} \sum_{d = 1}^{D} \frac{x_{k d}}{\sqrt{d}} + 0.125 {(\prod_{d = 3}^{D} x_{k d})}^{1 / (D - 2)} .

(20)

For the unconstrained Bayesian inference, we postulate $θ_{k}$ ’s to be exchangeable Gamma variables with shape 0.1 and scale 10 a priori, so the posterior is Gamma with shape $m_{k} + 1$ and scale $10 / (1 + 10 \sum_{j = 1}^{m_{k}} y_{k j})$ . The iPIPE point estimates are then obtained with $ϵ = 0.5$ and interval estimates with $ϵ = 0.025$ , and 0.975 using the sweep/SRSM algorithm with $K_{sub} = 7$ and 50 repetitions. iPIPE and the unconstrained Bayesian inference are evaluated based on 50 simulation replicates.

Both methods have similar magnitude of bias, which seems to remain in range as $D$ increases (Figure 5). In contrast, iPIPE has much smaller mse than unconstrained inference especially as $D$ increases (Figure 6). We note that these are simulation scenarios with sparse observations. For example, when $D = 12$ and $N = 5000$ , there are $K = 2886$ conditions with at least one observation, i.e., about 1.7 observations per condition. Hence, iPIPE improves upon the unconstrained inference by reducing variability, while keeping bias similar.

Figure 6. — Simulation setting 4: Root mean squared error vs true $θ_{k}$ for $D = 10, 11, 12$ and $N = 5000, 10000$ . Each circle in the figures represent a condition and the size of the circle is proportional to $m_{k}$ .

Table 5 shows the aggregate performance of the two methods in terms of bias, mse, coverage probability, and mean width of 95% credible intervals averaged across all $θ_{k}$ ’s for each pair $(D, N)$ . The average biases are small relative to mse and remain range bound. As expected, mse increases as $D$ increases, as the situations reflects increasingly sparse observations per condition. However, the performance of iPIPE relative to unconstrained inference improves quite substantially as $D$ increases, indicating the information contained in the partial ordering assumption. Additionally, by doubling $N$ from 5000 to 10000, the average mse is reduced by 40% to 50% for all $D$ considered. For iPIPE, we also examine the estimation properties for the PIPE-classifier ensemble $γ_{k} (t)$ by ${\hat{γ}}_{k} (t)$ in terms of average classification error (ACE) across all $K$ conditions, defined as

A C E = \frac{1}{K} \sum_{k = 1}^{K} proportion of {{\hat{γ}}_{k} (t) \neq γ_{k} (t)} .

(21)

TABLE 5.

Simulation setting 4: $D = 10, 11, 12$ with sparse sampling

$D$	$N$	$K$	Unconstrained				iPIPE, ${\hat{θ}}_{K}$
			bias	mse	cov	width	bias	mse	cov	width	$\bar{A C E}$
10	5000	1014	0.18	1.10	0.96	3.2	0.067	0.13	0.99	2.40	0.029
	10000	1024	0.095	0.45	0.95	1.93	0.087	0.092	0.99	1.72	0.026
11	5000	1876	0.31	1.94	0.96	5.03	−0.021	0.18	0.99	3.16	0.039
	10000	2034	0.20	1.15	0.96	3.15	0.034	0.12	0.99	2.35	0.029
12	5000	2886	0.38	2.42	0.97	6.50	−0.14	0.26	0.99	3.75	0.056
	10000	3746	0.31	1.96	0.96	5.02	−0.081	0.17	0.99	3.03	0.042

Open in a new tab

$K$ indicates the number of conditions with $m_{k} > 0$ .

bias, average bias; mse, average mean squared error; cov, average coverage probability; width, average mean width of 95% credible intervals; $\bar{A C E}$ , average of ACE (21) over 50 simulation replicates.

Table 5 reports the average $ACE (\bar{A C E})$ calculated by averaging ACE across all simulation replicates. $\bar{A C E}$ correlates with the average mse for estimating $θ$ : it increases as $D$ increases and $N$ decreases, but is low in all the scenarios we considered. One can improve $\bar{A C E}$ and estimation accuracy by increasing $K_{sub}$ and the number of repetitions in the SRSM algorithm. For example, we ran simulation using 100 repetitions (instead of 50) in SRSM for the scenario $D = 12$ and $N = 5000$ and obtained $\bar{A C E} = 0.052$ (vs. 0.056) and an average mse of 0.24 (vs. 0.26). While the improvement is minimal because 50 repetitions seem to be adequate, it suggests $\bar{A C E}$ is a good proxy for the adequacy of the SRSM algorithm. An advantage of using $\bar{A C E}$ over the use of mean squared errors is that $\bar{A C E}$ takes values on [0, 1], thus providing an index that facilitates benchmarking and interpretation of the accuracy of the method.

Finally and importantly, credible intervals of both methods achieve the nominal level, although iPIPE seems to be conservative and at the same time reduces the average width of the intervals. This indicates iPIPE retains its accuracy in quantifying uncertainty even under these sparse settings.

6 |. DISCUSSION

In this article, we deal with the estimation of monotone response surface defined on multiple factors and observed on a large number $K$ of distinct conditions. We make two main contributions. First, we have proposed an estimation method, called iPIPE, by inverting a partially ordered classifier ensemble (PIPE-classifiers). While the PIPE-classifiers are motivated with a decision theoretic framework with a classification-type gain function, they may be viewed as a projection of Bayes classifiers on the constrained space of partial ordering. iPIPE is nonparametric in that the method does not rely on any assumptions (e.g. additivity and smoothness) other than monotonicity. In our data examples and simulation, we have demonstrated that point estimation based on iPIPE behaves similarly to PAVA. The Bayesian decision-theoretic framework facilitates interval estimation and we have demonstrated that iPIPE-based 95% credible intervals achieve the frequentist coverage probability, and in fact, are conservative in our simulation scenarios. Such conservativeness interestingly comes with higher precision (shorter widths) when compared to the unconstrained Bayesian inference, and will warrant further investigation. Additionally, simulation results consistently show iPIPE has smaller mse than unconstrained estimation; and efficiency gain is particularly substantial when sampling of conditions is sparse. Also, iPIPE is versatile. It can be applied with many common statistical models described in Section 4, and it is potentially applicable to advanced semi-parametric models such as in spatiotemporal modeling; e.g., in Gaussian processes for spatial data (e.g., Banerjee et al. (2008); Datta et al. (2016)), the estimation of the cross-covariance function may be improved using iPIPE as the covariance between two locations is conceivably non-increasing in the distance between the locations. This represents an interesting and important line of future research.

Second, we have proposed algorithms that render iPIPE computationally feasible for estimating moderate-to-high dimension response surface, while the existing literature of estimating multivariate monotone functions has discussed little on situations with $D > 4$ . Specifically, we have proposed a sweep algorithm and have proved that it gives the true iPIPE $\hat{θ}$ defined in (7). At first glance, estimation by inverting a classification problem is more computationally intensive than the classification problem itself, because it involves iterating a threshold $t$ on a fine grid and it has to solve the classifiers for each $t$ . The sweep algorithm is interesting in that it takes advantage of the iteration step together with a sweep step (step 2c) to reduce the optimisation problem in classification (PIPE) into a more manageable subset maximisation step (step 2b). That is, the sweep algorithm integrates the two problems: estimation (iPIPE) can be a means to evaluating the classifiers (PIPE), while the former is in principle constructed by evaluating the latter at all thresholds. We have also proposed a sequential random subset maximisation (SRSM) algorithm to supplement the sweep algorithm. The idea of SRSM is to further reduce the subset maximisation step in sweep (step 2b) into even smaller computation tasks over sequentially selected random subsets. While there is no theoretical guarantee of giving the true maximisers, the SRSM method identifies the true maximisers in our data illustrations; and its likelihood of success can be enhanced by running the algorithms many times. We have applied the sweep/SRSM algorithm to analyse simulated data with $D \geq 10$ factors and $K > 1000$ conditions. We note that the computational costs of SRSM grow linearly in $K$ , as opposed to $| Γ |$ . In addition, while SRSM performs computation tasks over subsets of conditions sequentially, a possible alternative is to perform maximisation of each subset in parallel, and then pool and harmonise the results with respect to the constraint. Subset maximisation thus naturally lends itself to a divide-and-conquer approach (Guhaniyogi and Banerjee, 2018; Jordan et al., 2019), which can be implemented in parallel on multi-core machines or high-performance computing clusters. As such, the method can be scaled to address massive problems due to large dimension $D$ by leveraging the underlying computational architecture.

Supplementary Material

final submitted supplement

NIHMS1970282-supplement-final_submitted_supplement.pdf^{(478.2KB, pdf)}

acknowledgements

This work was supported by NIH grants R01HL153642, R01MH109496, and UL1TR001873. This work was also supported by the Robert N. Butler Columbia Aging Center of Columbia University.

Appendix

| Appendix A: Proof of Lemma 1

For brevity, we will omit the threshold $t$ in the proof of Lemma 1. Restating Lemma 1, we aim to prove ${\hat{γ}}_{i 1 k} \geq {\hat{γ}}_{i 0 k}$ for all $k \in {1, \dots, K}$ for a given $i$ .

First, since ${\hat{γ}}_{i 0} \in Γ_{i 0}$ and ${\hat{γ}}_{i 1} \in Γ_{i 1}$ , we have ${\hat{γ}}_{i 1 i} = 1 > 0 = {\hat{γ}}_{i 0 i}$ by definition.

Next, partition the set ${1, \dots, K}$ into $ℒ_{i} = \{k : x_{k} ≺ x_{i}\}, 𝒰_{i} = \{k : x_{k} > x_{i}\}$ , and $ℰ_{i} = {1, \dots, K} ∖ L_{i} \cup U_{i} \cup {i}$ . Because ${\hat{γ}}_{i 0 i} = 0$ , we have ${\hat{γ}}_{i 0 k} = 0$ for $k \in ℒ_{i}$ by monotonicity; and since ${\hat{γ}}_{i 1 k} \in {0, 1}$ , we have ${\hat{γ}}_{i 1 k} \geq 0 = {\hat{γ}}_{i 0 k}$ on $ℒ_{i}$ . Similarly, we observe that ${\hat{γ}}_{i 1 k} = 1$ for $k \in 𝒰_{i}$ because ${\hat{γ}}_{i 1 i} = 1$ , and hence ${\hat{γ}}_{i 1 k} \geq {\hat{γ}}_{i 0 k}$ on $𝒰_{i}$ . Further split $ℰ_{i}$ into two sets: $ℰ_{i 0} = \{k \in ℰ_{i} : {\hat{γ}}_{i 0 k} = 0\}$ and $ℰ_{i 1} = \{k \in ℰ_{i} : {\hat{γ}}_{i 0 k} = 1\}$ . By the definition of $ℰ_{i 0}$ , we have ${\hat{γ}}_{i 0 k} = 0 \leq {\hat{γ}}_{i 1 k} \in {0, 1}$ for $k \in ℰ_{i 0}$ . The proof of Lemma 1 will be completed by proving:

Claim 1. ${\hat{γ}}_{i 1 k} = 1$ for $k \in ℰ_{i 1}$ .

Recall that $g_{ℰ_{i 1}} = (g_{k} : k \in ℰ_{i 1})$ denotes the subvector $g$ on $ℰ_{i 1}$ , and suppose $g_{ℰ_{i 1}} \in Γ (ℰ_{i 1})$ . Construct a classifier ensemble ${\tilde{γ}}_{i 0} = ({\tilde{γ}}_{i 01}, \dots, {\tilde{γ}}_{i 0 K})$ as follows:

{\tilde{γ}}_{i 0 k} = {\hat{γ}}_{i 0 k} for k \in {\bar{ℰ}}_{i 1} = ℒ_{i} \cup 𝒰_{i} \cup ℰ_{i 0} \cup {i} and {\tilde{γ}}_{i 0 ℰ_{i 1}} = g_{ℰ_{i 1}} .

(22)

Claim 2. ${\tilde{γ}}_{i 0} \in Γ$ . Hence, ${\tilde{γ}}_{i 0} \in Γ_{i 0}$ because ${\tilde{γ}}_{i 0 i} = 0$ .

Proof of Claim 2: Since ${\hat{γ}}_{i 0} \in Γ_{i 0} \subset Γ$ , we have ${\hat{γ}}_{i 0 {\overline{ε}}_{i 1}} \in Γ ({\bar{ℰ}}_{i 1})$ . Thus, ${\tilde{γ}}_{i 0 {\overline{ε}}_{i 1}} = {\hat{γ}}_{i 0 {\overline{ε}}_{i 1}} \in Γ ({\bar{ℰ}}_{i 1})$ . Also by (22), we have ${\tilde{γ}}_{i 0 ℰ_{i 1}} = g_{ℰ_{i 1}} \in Γ (ℰ_{i 1})$ . That is, partial ordering of ${\tilde{γ}}_{i 0}$ holds within ${\bar{ℰ}}_{i 1}$ and $ℰ_{i 1}$ . To prove Claim 2, it remains to show that partial ordering of ${\tilde{γ}}_{i 0}$ holds for every pair $k \in ℰ_{i 1}$ and $k^{'} \in {\bar{ℰ}}_{i 1}$ :

First, consider the case $k^{'} \in ℒ_{i} \cup ℰ_{i 0} \cup {i}$ where ${\hat{γ}}_{i 0 k^{'}} = 0$ ; and recall that ${\hat{γ}}_{i 0 k} = 1$ for $k \in ℰ_{i 1}$ . Since ${\hat{γ}}_{i 0} \in Γ$ , partial ordering will hold between ${\hat{γ}}_{i 0 k^{'}}$ and ${\hat{γ}}_{i 0 k}$ implying that $x_{k} ≮ x_{k^{'}}$ . Because ${\tilde{γ}}_{i 0 k^{'}} = {\hat{γ}}_{i 0 k} = 0, g_{k}$ can take on any value while partial ordering will hold between $g_{k}$ and ${\tilde{γ}}_{i 0 k^{'}}$ .
Second, consider the case $k^{'} \in 𝒰_{i}$ where ${\hat{γ}}_{i 0 k^{'}} = 1$ . Note that $x_{k^{'}} ≮ x_{k}$ : because $x_{k^{'}} ≺ x_{k}$ would imply $x_{i} ≺ x_{k}$ , which would in turn put $k \in 𝒰_{i}$ by definition of $𝒰_{i}$ . As a result, because ${\tilde{γ}}_{i 0 k^{'}} = {\hat{γ}}_{i 0 k^{'}} = 1, g_{k}$ can take on any value while partial ordering will hold between $g_{k}$ and ${\tilde{γ}}_{i 0 k^{'}}$ .

This completes the proof of Claim 2.

Next, write $H (g; t) = \prod_{k = 1}^{K} ϕ_{k}^{g_{k}} ρ_{k}^{1 - g_{k}}$ where

ϕ_{k} = ϕ_{k} (t) = {\{ϵ p_{k} (t)\}}^{w_{k}} and ρ_{k} = ρ_{k} (t) = {\{(1 - ϵ) (1 - p_{k} (t))\}}^{w_{k}},

(23)

i.e., dependence on $t$ is omitted for brevity. Then we have

H ({\hat{γ}}_{i 0}; t) = ρ_{i} \prod_{k \in ℒ_{i}} ρ_{k} \prod_{k \in 𝒰_{i}} ϕ_{k}^{{\hat{γ}}_{i 0 k}} ρ_{k}^{1 - {\hat{γ}}_{i 0 k}} \prod_{k \in ℰ_{i 0}} ρ_{k} \prod_{k \in ℰ_{i 1}} ϕ_{k}

(24)

based on the definitions of $ℒ_{i}, ℰ_{i 0}$ , and $ℰ_{i 1}$ . Similarly, we can write

H ({\tilde{γ}}_{i 0}; t) = ρ_{i} \prod_{k \in ℒ_{i}} ρ_{k} \prod_{k \in 𝒰_{i}} ϕ_{k}^{{\hat{γ}}_{i 0 k}} ρ_{k}^{1 - {\hat{γ}}_{i 0 k}} \prod_{k \in ℰ_{i 0}} ρ_{k} \prod_{k \in ℰ_{i 1}} ϕ_{k}^{g_{k}} ρ_{k}^{1 - g_{k}}

(25)

where $g_{ℰ_{i 1}} \in Γ (ℰ_{i 1})$ . Because ${\hat{γ}}_{i 0}$ maximises $H$ on $Γ_{0 i}$ by definition (8) and ${\tilde{γ}}_{i 0} \in Γ_{0 i}$ per Claim 2, we have $({\hat{γ}}_{i 0}; t) \geq H ({\tilde{γ}}_{i 0}; t)$ . Applying this with (24) and (25), we obtain the inequality

\prod_{k \in ℰ_{i 1}} {(\frac{ϕ_{k}}{ρ_{k}})}^{1 - g_{k}} \geq 1

(26)

for any $g_{ℰ_{i 1}} \in Γ (ℰ_{i 1})$ .

Finally, suppose that ${\hat{γ}}_{i 1 k} = 0$ for some $k \in ℰ_{i 1}$ where ${\hat{γ}}_{i 1}$ maximises $H$ on $Γ_{i 1}$ per (8). Construct an ensemble ${\tilde{γ}}_{i 1} \in {0, 1}^{K}$ as follows: define

{\tilde{γ}}_{i 1 k} = \{\begin{matrix} {\hat{γ}}_{i 1 k} & for k \in {\bar{ℰ}}_{i 1} \\ 1 & for k \in ℰ_{i 1} . \end{matrix}

(27)

Using similar arguments used in the proof of Claim 2 above, we can show that ${\tilde{γ}}_{i 1} \in Γ_{i 1}$ . Further since ${\hat{γ}}_{i 1} \in Γ$ , the subvector ${\hat{γ}}_{i 1 ℰ_{i 1}} \in Γ (ℰ_{i 1})$ . Then we obtain

\frac{H ({\tilde{γ}}_{i 1}; t)}{H ({\hat{γ}}_{i 1}; t)} = \prod_{k \in ℰ_{i 1}} {(\frac{ϕ_{k}}{ρ_{i}})}^{1 - {\hat{γ}}_{i 1 k}} \geq 1.

(28)

The inequality in (28) is a result of (26) and the fact that ${\hat{γ}}_{i 1 ε_{i 1}} \in Γ (ℰ_{i 1})$ . However, the inequality (28) contradicts the definition of ${\hat{γ}}_{i 1}$ as the maximiser of $H$ on $Γ_{i 1}$ , except when ${\hat{γ}}_{i 1 k} = 1$ for all $k \in ℰ_{i 1}$ . Thus, by contradiction, ${\hat{γ}}_{i 1 k} \neq 0$ for any $k \in ℰ_{i 1}$ . This completes the proof of Claim 1 and Lemma 1.

| Appendix B: Proofs of Theorem 1, Proposition 3, and Proposition 4

Proof of Theorem 1: First consider a fixed $i$ as in Lemma 1. The maximiser $\hat{γ} (t)$ of $H (g; t)$ over $Γ$ will be either ${\hat{γ}}_{i 0} (t)$ or ${\hat{γ}}_{i 1} (t)$ , the respective maximisers over $Γ_{i 0}$ and $Γ_{i 1}$ . Specifically,

{\hat{γ}}_{i} (t) = 0 ⟺ \hat{γ} (t) = {\hat{γ}}_{i 0} (t)

(29)

⟺ H ({\hat{γ}}_{i 0} (t); t) > H ({\hat{γ}}_{i 1} (t); t) .

(30)

Equation (29) holds because ${\hat{γ}}_{i 0} (t) \in Γ_{0 i}$ and expanding on (30) gives

\prod_{k \in {i} \cup ℒ_{i}} ρ_{k} (t) \prod_{k \in 𝒰_{i} \cup ℰ_{i}} ϕ_{k} {(t)}^{{\hat{γ}}_{i 0 k}} ρ_{k} {(t)}^{1 - {\hat{γ}}_{i 0 k}} > \prod_{k \in {i} \cup 𝒰_{i}} ϕ_{k} (t) \prod_{k \in ℒ_{i} \cup ℰ_{i}} ϕ_{k} {(t)}^{{\hat{γ}}_{i 1 k}} ρ_{k} {(t)}^{1 - {\hat{γ}}_{i 1 k}}

(31)

where $ρ_{k} (t)$ and $ϕ_{k} (t)$ are defined in (23). Dividing both sides of (31) by its right-hand side further gives

\frac{ρ_{i} (t)}{ϕ_{i} (t)} \times \prod_{k \in 𝒰_{i}} {(\frac{ρ_{k} (t)}{ϕ_{k} (t)})}^{1 - {\hat{γ}}_{i 0 k}} \times \prod_{k \in ℒ_{i}} {(\frac{ρ_{k} (t)}{ϕ_{k} (t)})}^{{\hat{γ}}_{i 1 k}} \times \prod_{k \in ℰ_{i}} {(\frac{ρ_{k} (t)}{ϕ_{k} (t)})}^{{\hat{r}}_{i 1 k} - {\hat{γ}}_{i 0 k}} > 1.

(32)

The first three terms in (32) are increasing functions in $t$ , because it can be easily verified that $ρ_{k} (t)$ is increasing and $ϕ_{k} (t)$ is decreasing in $t$ . In addition, Lemma 1 implies that ${\hat{γ}}_{i 1 k} - {\hat{γ}}_{i 0 k} \geq 0$ and therefore the fourth term in (32) is also increasing $t$ . Therefore, for any $t^{'} > t$ , the inequality (32) implies

\frac{ρ_{i} (t^{'})}{ϕ_{i} (t^{'})} \times \prod_{k \in 𝒰_{i}} {(\frac{ρ_{k} (t^{'})}{ϕ_{k} (t^{'})})}^{1 - {\hat{γ}}_{i 0 k}} \times \prod_{k \in ℒ_{i}} {(\frac{ρ_{k} (t^{'})}{ϕ_{k} (t^{'})})}^{{\hat{γ}}_{i 1 k}} \times \prod_{k \in ℰ_{i}} {(\frac{ρ_{k} (t^{'})}{ϕ_{k} (t^{'})})}^{{\hat{γ}}_{i 1 k} - {\hat{γ}}_{i 0 k}} > 1

(33)

which is equivalent to $\hat{γ} (t^{'}) = {\hat{γ}}_{i 0} (t^{'}) ⟺ {\hat{γ}}_{i} (t^{'}) = 0$ using the same logic as (29) and (30). We have thus showed ${\hat{γ}}_{i} (t) = 0 \Rightarrow {\hat{γ}}_{i} (t^{'}) = 0$ for any given $i$ and completed the proof of Theorem 1.

Proof of Proposition 3: Let $F_{k} (t ∣ y_{1}, \dots, y_{K})$ denote the posterior cdf of $θ_{k}$ and assume that it is continuous, so that $p_{k} (t) = E \{γ_{k} (t) ∣ y_{1}, \dots, y_{k}\} = 1 - F_{k} (t)$ . The estimator ${\hat{θ}}_{k}$ will thus solve $1 - F_{k} ({\hat{θ}}_{k}) = 1 - ϵ ⟺ {\hat{θ}}_{k} = F_{k}^{- 1} (ϵ)$ .

Proof of Proposition 4: First, we note that ${\hat{γ}}_{{\overline{C}}_{t}} = 1$ maximises $H_{\bar{C_{t}}}$ over $Γ ({\overline{C}}_{t})$ , by applying Proposition 2 with the fact that ${\hat{γ}}_{k^{'}}^{B} (t) = 1$ for $k^{'} \in {\overline{C}}_{t}$ (hence $\notin Z_{t}$ ). Next, we note that $x_{k^{'}} ≮ x_{k}$ for any pair $k^{'} \in {\overline{C}}_{t}$ and $k \in C_{t}$ . Thus, the ensemble formed by putting ${\hat{γ}}_{{\overline{C}}_{t}}$ and ${\hat{γ}}_{C_{t}}$ together will belong to $Γ$ ; and since they maximise $H_{{\overline{C}}_{t}}$ and $H_{C_{t}}$ respectively, the ensemble thus formed maximises $H = H_{\bar{C_{t}}} \times H_{C_{t}}$ .

Footnotes

conflict of interest

The authors have no conflicts of interest to disclose.

data availability statement

The data underlying this article will be shared on reasonable request to the corresponding author.

references

Ayer M, Brunk HD, Ewing GM, Reid WT and Silverman E (1955) An empirical distribution function for sampling with incomplete information. The Annals of Mathematical Statistics, 26, 641–647. [Google Scholar]
Bacchetti P (1989) Additive isotonic regression. Journal of the American Statistical Association, 84, 289–294. [Google Scholar]
Banerjee S, Gelfand AE, Finley AO and Sang H (2008) Gaussian predictive process models for large spatial data sets. J. R. Statist. Soc. B, 70, 825–848. [DOI] [PMC free article] [PubMed] [Google Scholar]
Barlow RE, Bartholowmew DJ, Bremmer JM and Brunk HD (1972) Statistical Inference Under Order Restrictions. John Wiley & Sons, New York. [Google Scholar]
Bornkamp B, Ickstadt K and Dunson D (2010) Stochastically ordered multiple regression. Biostatistics, 11, 419–431. [DOI] [PMC free article] [PubMed] [Google Scholar]
Brunk HD (1955) Maximum likelihood estimates of monotone parameters. The Annals of Mathematical Statistics, 26, 607–616. [Google Scholar]
Burridge J (1981) Empirical Bayes analysis of survival time data. J. R. Statist. Soc. B, 43, 65–75. [Google Scholar]
Cheung K, Ling W, Karr CJ, Weingardt K, Schueller SM and Mohr DC (2018) Evaluation of a recommender app for apps for the treatment of depression and anxiety: an analysis of longitudinal user engagement. Journal of the American Medical Informatics Association, 25, 955–962. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cheung Y, Chandereng T and Diaz KM (2022) A novel framework to estimate multidimensional minimum effective doses using asymmetric posterior gain and e-tapering. The Annals of Applied Statistics, 16, 1445–1458. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chung Y, Ivanova A, Hudgens M and Fine J (2018) Partial likelihood estimation of isotonic proportional hazards models. Biometrika, 105, 133–148. [DOI] [PMC free article] [PubMed] [Google Scholar]
Datta A, Banerjee S, Finley AO and Gelfand AE (2016) Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, 111, 800–812. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dellaportas P and Smith A (1993) Bayesian inference for generalized linear and proportional hazards models via Gibbs sampling. Appl. Statist, 42, 443–459. [Google Scholar]
Fu A, Narasimhan B and Boyd S (2020) CVXR: An R package for disciplined convex optimization. Journal of Statistical Software, 94, 1–34. [Google Scholar]
Gelman A, Carlin J, Stern H and Rubin D (1995) Bayesian Data Analysis. Chapman & Hall. [Google Scholar]
Gilks W and Wild P (1992) Adaptive rejection sampling for gibbs sampling. Appl. Statist, 41, 337–348. [Google Scholar]
Guhaniyogi R and Banerjee S (2018) Meta-kriging: scalable Bayesian modeling and inference for massive spatial datasets. Technometrics, 60, 430–444. [DOI] [PMC free article] [PubMed] [Google Scholar]
Holmes CC and Heard NA (2003) Generalized monotonic regression using random change points. Statistics in Medicine, 22, 623–638. [DOI] [PubMed] [Google Scholar]
Ibrahim J and Laud P (1991) On Bayesian analysis of generalized linear models using Jeffrey’s prior. Journal of the American Statistical Association, 86, 981–986. [Google Scholar]
Jordan MI, Lee JD and Yang Y (2019) Communication-efficient distributed statistical inference. Journal of the American Statistical Association, 114, 668–681. [Google Scholar]
Leitenstorfer F and Tutz G (2007) Generalized monotonic regression based on B-splines with an application to air pollution data. Biostatistics, 8, 654–673. [DOI] [PubMed] [Google Scholar]
Lin L and Dunson DB (2014) Bayesian monotone regression using Gaussian process projection. Biometrika, 101, 303–317. [Google Scholar]
Mander A and Sweeting M (2015) A product of independent beta probabilities dose escalation design for dual-agent phase I trials. Statistics in Medicine, 34, 1261–1276. [DOI] [PMC free article] [PubMed] [Google Scholar]
McCullagh P and Nelder J (1989) Generalized Linear Models. Chapman & Hall/CRC, second edn. [Google Scholar]
Morton-Jones T, Diggle P, Parker L, Dickinson HO and Binks K (2000) Additive isotonic regression models in epidemiology. Statistics in Medicine, 19, 849–859. [DOI] [PubMed] [Google Scholar]
Ramsay JO (1988) Monotone regression splines in action. Statistical Science, 3, 425–441. [Google Scholar]
Robertson T, Wright FT and Dykstra RL (1988) Order Restricted Statistical Inference. John Wiley & Sons, New York. [Google Scholar]
Sinha D and Dey DK (1997) Semiparametric Bayesian analysis of survival data. Journal of the American Statistical Association, 92, 1195–1212. [Google Scholar]
Sinha D, Ibrahim J and Chen M (2003) A Bayesian justification of Cox’s partial likelihood. Biometrika, 90, 629–641. [Google Scholar]
Stan Development Team (2021) RStan: the R interface to Stan. R package version 2.21.3, https://mc-stan.org/.
Stein J, Rodstein BM, Levine SR, Cheung K, Sicklick A, Silver B, Hedeman R Egan A, Borg-Jensen P and Magdon-Ismail Z (2022) Which road to recovery?: Factors influencing postacute stroke discharge destinations: A Delphi study. Stroke, 53, 947–955. [DOI] [PubMed] [Google Scholar]
Wang Y and Taylor J (2004) Monotone constrained tensor-product B-spline with application to screening studies. The University of Michigan Department of Biostatistics Working Paper Series, 1022, Berkeley Electronic Press. [Google Scholar]
Wright FT (1982) Monotone regression estimates for grouped observations. The Annals of Statistics, 10, 278–286. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

final submitted supplement

NIHMS1970282-supplement-final_submitted_supplement.pdf^{(478.2KB, pdf)}

Data Availability Statement

The data underlying this article will be shared on reasonable request to the corresponding author.

[R1] Ayer M, Brunk HD, Ewing GM, Reid WT and Silverman E (1955) An empirical distribution function for sampling with incomplete information. The Annals of Mathematical Statistics, 26, 641–647. [Google Scholar]

[R2] Bacchetti P (1989) Additive isotonic regression. Journal of the American Statistical Association, 84, 289–294. [Google Scholar]

[R3] Banerjee S, Gelfand AE, Finley AO and Sang H (2008) Gaussian predictive process models for large spatial data sets. J. R. Statist. Soc. B, 70, 825–848. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Barlow RE, Bartholowmew DJ, Bremmer JM and Brunk HD (1972) Statistical Inference Under Order Restrictions. John Wiley & Sons, New York. [Google Scholar]

[R5] Bornkamp B, Ickstadt K and Dunson D (2010) Stochastically ordered multiple regression. Biostatistics, 11, 419–431. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Brunk HD (1955) Maximum likelihood estimates of monotone parameters. The Annals of Mathematical Statistics, 26, 607–616. [Google Scholar]

[R7] Burridge J (1981) Empirical Bayes analysis of survival time data. J. R. Statist. Soc. B, 43, 65–75. [Google Scholar]

[R8] Cheung K, Ling W, Karr CJ, Weingardt K, Schueller SM and Mohr DC (2018) Evaluation of a recommender app for apps for the treatment of depression and anxiety: an analysis of longitudinal user engagement. Journal of the American Medical Informatics Association, 25, 955–962. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Cheung Y, Chandereng T and Diaz KM (2022) A novel framework to estimate multidimensional minimum effective doses using asymmetric posterior gain and e-tapering. The Annals of Applied Statistics, 16, 1445–1458. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Chung Y, Ivanova A, Hudgens M and Fine J (2018) Partial likelihood estimation of isotonic proportional hazards models. Biometrika, 105, 133–148. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Datta A, Banerjee S, Finley AO and Gelfand AE (2016) Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, 111, 800–812. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] Dellaportas P and Smith A (1993) Bayesian inference for generalized linear and proportional hazards models via Gibbs sampling. Appl. Statist, 42, 443–459. [Google Scholar]

[R13] Fu A, Narasimhan B and Boyd S (2020) CVXR: An R package for disciplined convex optimization. Journal of Statistical Software, 94, 1–34. [Google Scholar]

[R14] Gelman A, Carlin J, Stern H and Rubin D (1995) Bayesian Data Analysis. Chapman & Hall. [Google Scholar]

[R15] Gilks W and Wild P (1992) Adaptive rejection sampling for gibbs sampling. Appl. Statist, 41, 337–348. [Google Scholar]

[R16] Guhaniyogi R and Banerjee S (2018) Meta-kriging: scalable Bayesian modeling and inference for massive spatial datasets. Technometrics, 60, 430–444. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Holmes CC and Heard NA (2003) Generalized monotonic regression using random change points. Statistics in Medicine, 22, 623–638. [DOI] [PubMed] [Google Scholar]

[R18] Ibrahim J and Laud P (1991) On Bayesian analysis of generalized linear models using Jeffrey’s prior. Journal of the American Statistical Association, 86, 981–986. [Google Scholar]

[R19] Jordan MI, Lee JD and Yang Y (2019) Communication-efficient distributed statistical inference. Journal of the American Statistical Association, 114, 668–681. [Google Scholar]

[R20] Leitenstorfer F and Tutz G (2007) Generalized monotonic regression based on B-splines with an application to air pollution data. Biostatistics, 8, 654–673. [DOI] [PubMed] [Google Scholar]

[R21] Lin L and Dunson DB (2014) Bayesian monotone regression using Gaussian process projection. Biometrika, 101, 303–317. [Google Scholar]

[R22] Mander A and Sweeting M (2015) A product of independent beta probabilities dose escalation design for dual-agent phase I trials. Statistics in Medicine, 34, 1261–1276. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] McCullagh P and Nelder J (1989) Generalized Linear Models. Chapman & Hall/CRC, second edn. [Google Scholar]

[R24] Morton-Jones T, Diggle P, Parker L, Dickinson HO and Binks K (2000) Additive isotonic regression models in epidemiology. Statistics in Medicine, 19, 849–859. [DOI] [PubMed] [Google Scholar]

[R25] Ramsay JO (1988) Monotone regression splines in action. Statistical Science, 3, 425–441. [Google Scholar]

[R26] Robertson T, Wright FT and Dykstra RL (1988) Order Restricted Statistical Inference. John Wiley & Sons, New York. [Google Scholar]

[R27] Sinha D and Dey DK (1997) Semiparametric Bayesian analysis of survival data. Journal of the American Statistical Association, 92, 1195–1212. [Google Scholar]

[R28] Sinha D, Ibrahim J and Chen M (2003) A Bayesian justification of Cox’s partial likelihood. Biometrika, 90, 629–641. [Google Scholar]

[R29] Stan Development Team (2021) RStan: the R interface to Stan. R package version 2.21.3, https://mc-stan.org/.

[R30] Stein J, Rodstein BM, Levine SR, Cheung K, Sicklick A, Silver B, Hedeman R Egan A, Borg-Jensen P and Magdon-Ismail Z (2022) Which road to recovery?: Factors influencing postacute stroke discharge destinations: A Delphi study. Stroke, 53, 947–955. [DOI] [PubMed] [Google Scholar]

[R31] Wang Y and Taylor J (2004) Monotone constrained tensor-product B-spline with application to screening studies. The University of Michigan Department of Biostatistics Working Paper Series, 1022, Berkeley Electronic Press. [Google Scholar]

[R32] Wright FT (1982) Monotone regression estimates for grouped observations. The Annals of Statistics, 10, 278–286. [Google Scholar]

PERMALINK

Monotone response surface of multi-factor condition: estimation and Bayes classifiers

Ying Kuen Cheung, PhD

Keith M Diaz, PhD

Abstract

1 |. INTRODUCTION

2 |. METHODS

2.1 |. A partially ordered classifier ensemble

2.2 |. Inverting PIPE-classifiers (iPIPE)

3 |. COMPUTATION ALGORITHMS

3.1 |. A sweep algorithm

3.2 |. Numerical illustration: clinical decisions for rehabilitation

TABLE 1.

3.3 |. Sequential random subset maximisation

TABLE 2.

4 |. APPLICATIONS

4.1 |. Estimating population parameters under partial ordering

Figure 1.

4.2 |. Regression models

4.3 |. Covariate-dependent response surface

4.4 |. Hierarchical models for repeated measurements

TABLE 3.

Figure 2.

5 |. SIMULATION STUDY

5.1 |. Simulation setting 1: D=2 with equal sample size

TABLE 4.

5.2 |. Simulation setting 2: D=4 with uneven sample sizes

Figure 3.

5.3 |. Simulation setting 3: D=7 with uneven sample sizes

Figure 4.

5.4 |. Simulation setting 4: The effect of dimension D

Figure 5.

Figure 6.

TABLE 5.

6 |. DISCUSSION

Supplementary Material

acknowledgements

Appendix

| Appendix A: Proof of Lemma 1

| Appendix B: Proofs of Theorem 1, Proposition 3, and Proposition 4

Footnotes

data availability statement

references

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

5.1 |. Simulation setting 1: $D = 2$ with equal sample size

5.2 |. Simulation setting 2: $D = 4$ with uneven sample sizes

5.3 |. Simulation setting 3: $D = 7$ with uneven sample sizes

5.4 |. Simulation setting 4: The effect of dimension $D$