Skip to main content
HHS Author Manuscripts logoLink to HHS Author Manuscripts
. Author manuscript; available in PMC: 2018 Dec 1.
Published in final edited form as: Biometrika. 2017 Dec;104(4):939–952.

Bayesian Local Extremum Splines

M W WHEELER 1, D B DUNSON 2, A H HERRING 3
PMCID: PMC5798493  NIHMSID: NIHMS926672  PMID: 29422695

Summary

We consider shape restricted nonparametric regression on a closed set X, where it is reasonable to assume the function has no more than H local extrema interior to X. Following a Bayesian approach we develop a nonparametric prior over a novel class of local extremum splines. This approach is shown to be consistent when modeling any continuously differentiable function within the class considered, and is used to develop methods for testing hypotheses on the shape of the curve. Sampling algorithms are developed, and the method is applied in simulation studies and data examples where the shape of the curve is of interest.

Keywords: Constrained function estimation, Isotonic regression, Monotone splines, Nonparametric, Shape constraint

1. Introduction

This paper considers Bayesian modeling of an unknown function f0:X, where it is known that f0 has at most H local extrema, or change points, interior to X, and one wishes to estimate the function subject to constraints or test the hypothesis the function has a specific shape. For example, one may wish to consider a monotone function versus one having an N shape. We propose a spline construction that allows for nonparametric estimation of shape-constrained functions having at most H change points. The approach places a prior over a knot set dense in X, and, to sample over the models defined by this knot set, a Markov chain Monte Carlo algorithm is developed to sample models. The method allows for nonparametric hypothesis testing of different shapes within the class of functions considered.

The shape-constrained regression literature focuses primarily on functions that are monotone, convex, or have a single minimum; that is, cases with H ≤ 1. Ramgopal et al. (1993), Lavine & Mockus (1995), and Bornkamp & Ickstadt (2009) consider priors over cumulative distribution functions used to model monotone curves. Holmes & Mallick (2003), Neelon & Dunson (2004), Meyer (2008), and Shively et al. (2009) develop spline-based approaches for monotone functions. Hans & Dunson (2005) design a prior for umbrella-shaped functions, while Shively et al. (2011) propose methods for fixed- and free- knot splines that model continuous segments having a single unknown change point.

Extending these approaches to broader shape constraints is not straightforward. For example, to obtain H = 3 change points, one could define a prior over B-spline bases de Boor (2001, page 87)) having four monotone segments that alternately increase and decrease. However, for even a moderate number of pre-specified knots and a known number of change points, allowing for uncertainty in the locations of the change points leads to a daunting computational problem. Bayesian computation via Markov chain Monte Carlo is subject to slow mixing and convergence rates in alternating between updating the spline coefficients conditionally on the change points and vice versa, and it is not clear how to devise algorithms that can efficiently update both simultaneously. These difficulties are compounded by allowing for the possibility that some of the change points should be removed, which is commonly the situation in applications. By defining a new spline basis based on the number of change points, we bypass these issues.

Little work has been done on nonparametric Bayesian testing of curve shapes. Salomond (2014) and Scott et al. (2015) consider Bayesian nonparametric testing for monotonic versus an unspecified nonparametric alternative, but do not consider shapes beyond monotonicity. Our approach allows for testing of all shapes, where shape is defined as the type and sequence of extrema. For example, one can use this approach to test for an umbrella shape verses an N-shaped curve and use the same procedure to test the umbrella shape against monotone alternatives.

We propose a new approach to incorporating shape constraints based on splines that are carefully constructed to induce curves having a particular number of extrema. This is similar in spirit to the I-spline construction of Ramsay (1988) or the C-spline construction for convex splines (Meyer, 2008; Meyer et al., 2011), both of which create a spline construction based upon the derivative of the spline. When paired with positivity constraints on the spline coefficients, our construction enforces shape restrictions on the curve of interest by limiting the number of change points.

Another key aspect of our approach is that we place a prior over a countable dense set of knots, which allows the number of the splines in the model space to grow. This bypasses the sensitivity to choice of the number of knots, while facilitating computation and theory on consistency. In particular, we propose a prior over nested model spaces where the location of the knots is known for each model. This allows for a straightforward reversible jump Markov chain Monte Carlo algorithm (Green, 1995) based upon Godsill (2001). This is different from much of the previous Bayes literature allowing unknown numbers of knots (Biller, 2000; DiMatteo et al., 2001). In these methods, the knot locations are unknown, and the reversible jump Markov chain Monte Carlo proposal must propose a knot to add or delete as well as its location. Such algorithms are notoriously inefficient.

2. Model

2·1. Local Extremum Spline Construction

Let ℱH be a set of functions defined on the closed set X, such that for f0 ∈ ℱH, f0 is continuously differentiable and has H or fewer local extrema interior to X. Such functions can be modeled using B-spline approximations of the form

f(x)=k=1K+j1βkB(j,k)(x). (1)

Here, βk is a scalar coefficient and B(j,k)(x) is a B-Spline function of order j defined on the knot set T={τk}k=1K, τ1 ≤…≤ τK, which includes end knots. For any knot set, de Boor (2001, page 145) showed that there exist spline approximations such that ‖ff0 ≤ Δ ‖f0, where Δ is the maximum difference between adjacent knots. Though this construction can be used to model f0 with arbitrary accuracy, it does not ensure that the approximating function f is itself in ℱH.

We force f ∈ ℱH to have at most H local extrema by defining a new spline basis

B(j,k)(x)=Mx{h=1H(ξαh)}B(j,k)(ξ)dξ, (2)

where B(j,k)(x) is a B-spline constructed using the knot set T, {α1, …, αh} are distinct change points, and M is a fixed integer. Letting B(j,0)(x)=1, if βk ≥ 0, for all k ≥ 1, any linear combination of local extremum spline basis functions for any distinct values of α1,…, αH in (2) will be in ℱH.

Proposition 1

If f(x)=k=0K+j1βkB(j,k)(x) for any K ≥ 1 with M ∈ {−1, 1}, j ≥ 1, and βk ≥ 0 for all k ≥ 1, then f ∈ ℱH.

This result follows from the constraint on the βk coefficients. By forcing βk ≥ 0 for k ≥ 1, the sign of the derivative is controlled by the polynomial Mh=1H(xαh), which allows a maximum of H local extrema located at the change points {α1, …, αH}. When βk = … = βk+1 = 0 and αh ∈ [τk+j, τk+j+1], αh does not define a unique extremum. In this case, there is a flat region, and multiple configurations of the change point parameters can give the same curve. Otherwise, the extrema are uniquely defined for all αhX, and fewer than H extrema can be considered if αhX.

Theorem 1

For any f0 ∈ ℱH and ε > 0 there exist a knot set T and a local extremum spline fLX defined on this knot set such that

f0fLX<ε.

The flexibility of local extremum splines is attributable to the B-splines used in their construction. The proof of Theorem 1 assumes that M can be chosen to be positive or negative, which allows all functions in ℱH to be approximated. If M is fixed, then any function with H − 1 extrema can be modeled. For exactly H extrema, the approach is limited to modeling functions that are either initially increasing or initially decreasing, and this depends on the sign of M.

Though the polynomial weighting does not affect the ability of the local extremum spline to model arbitrary functions in ℱH, it does impact the magnitude of the spline, supxX|B(j,k)(x)|, which may cause difficulty in the prior specification. To minimize this effect it is often beneficial to construct the splines on the interval (−0·5, 0·5). Additionally, it is often beneficial to multiply M by a fixed constant to aid in prior specification.

2·2. Infill Process Prior

Bayesian methods for automatic knot selection (Biller, 2000; DiMatteo et al., 2001) commonly define priors over the number and location of knots. Using free knots presents computational challenges, while fixed knots are too inflexible; we address this by defining a prior over a branching process where the children of each generation represent knot locations that are binary infills of the previous generation. This defines a nested set of spline models such that successive generations produce knots that can be arbitrarily close.

To make these ideas explicit, define TN={a/2N+1:a=1,3,,2N+11} with N ∈ {0, 1, 2, 3, …}. Assume for the sake of exposition, and consider an infinite complete binary tree. In this tree, each node at a given depth N is uniquely labeled using an element from TN. If the node’s label is a/2N+1, its children are labeled (2a − 1)/2N+2 and (2a + 1)/2N+2. For example, the node labeled 3/8 at N + 2 has children labeled 5/16 and 7/16, and the root node labeled 1/2 has children labeled 1/4 and 3/4.

We induce a prior on the set of local extremum spline basis functions through a branching process over this tree. The process starts at the root node N = 0 where the generation of children occurs via two independent Bernoulli experiments having probability of success ζ. On each success, a child is generated, and its label is added to the knot set. This process repeats until it dies out. If ζ < 0·5, the probability of extinction is 1 (Feller, 1974, page 297). To favor parsimony, we define the probability of success for a node at a given depth N to be 0·5N+1, which decreases the probability of adding a new node the larger the tree becomes. The tree ℳ generated from this process corresponds to a knot set TM. We complete the knot set by adding end knots {0, 1}.

Letting K=|TM| be the number of knots for tree ℳ including end knots, there are K + j − 1 basis functions. Letting βk denote the coefficient on B(j,k)(x), we choose the prior:

p(βk|M)=π1(βk=0)+(1π)Exp(βk;λ),1kK+j1, (3)

where Exp(βk; λ) is an exponential distribution with rate parameter λ, π is the prior probability of βk = 0, and the βk are drawn independently conditionally on ℳ, π, and λ. For the intercept, we let β0 ~ N(0, c), and we allow for greater adaptivity to the data through hyperpriors, π ~Be(ν, ω) and λ ~ Ga(δ, κ)1(λ > ε), which is a truncated gamma distribution, that is truncated slightly above zero to guarantee posterior consistency. In practice, this value is set to 10−5, making the prior indistinguishable from the Gamma distribution.

To allow uncertainty in locations of the change points, we choose the prior

p(α)=h=1HTN{αh;(ba)/2,1,a,b} (4)

where TN{(ba)/2, 1, a, b} is a normal distribution with mean (ba)/2 and variance 1, truncated below by a and above by b with X[a,b]. If αhinfX or αhsupX, then the change point is removed. We assume that M is pre-specified corresponding to prior knowledge of whether the function is initially increasing or decreasing, though generalizations to place a Bernoulli or alternative prior on M are straightforward.

The prior for the change point parameters is defined such that X[a,b]. A change point placed outside of X allows the derivative of f to be non-zero at inf X or sup X. In practice, results are insensitive to the choice of a and b. In what follows, we choose a=inf(X)Δ and b=sup(X)+Δ, where Δ={sup(X)inf(X)}/2.

2·3. Prior Properties

Define ℱH+ as the space of continuously differentiable functions with H or fewer local extrema, such that, for all f0 ∈ ℱH+ having exactly H extrema, the first extremum from the left is a maximum, and, for all functions in f0 ∈ ℱH+ having less than H extrema, the function is also in ℱH−1. Conversely, define ℱH as the set of continuously differentiable functions with H or fewer local extrema, such that for all functions having exactly H extrema, the first from the left is a minimum, and for all functions f0 ∈ ℱH having less than H extrema, they are also in ℱH−1. The prior places positivity in ε−neighborhoods of any f0 in ℱH or ℱH+ depending on the sign of M.

Lemma 1

Letting fLX be a randomly generated local extremum spline from the prior defined in §2·2 for all f0 ∈ ℱH−1,

pr(fLXf0<ε)>0.

This holds for all f0 ∈ ℱH+ if H is odd and M < 0 or H is even and M > 0. Otherwise, if H is even and M > 0 or H is odd and M < 0, this holds for all f0 ∈ ℱH−.

Using this result we can show posterior consistency. Assume that Y = (y1,…, yn)T are observed at locations (x1,…,xn) such that yi~N{f0(xi),σ02}. Following Choi & Schervish (2007), assume that the design points are independent and identically distributed from some probability distribution Q on the interval X, or observed using a fixed design such that max(|xixi+1|) < (K1n)−1, where 0 < K1 < 1 and i < n. Define the neighborhoods Wε,n = {(f, σ) : ∫ |f(x) − f0(x)|dQn(x) < ε, |σ/σ0 − 1| < ε} and Uε = {(f, σ) : dQ(f, f0) < ε, |σ/σ0 − 1| < ε}, where dQ(f1, f2) = inf {ε > 0 : Q[{x : |f1(x) − f2(x)| > ε}] < ε}. Under the assumption that the prior over σ assigns positive probability to every ε–neighborhood of σ0, one has:

Theorem 2

Let fLX be a randomly generated curve from the prior defined in §2·2 with f0 ∈ ℱH−1. If Pf0,σ0 is the joint distribution of {yi}i=1 conditionally on {xi}i=1, {Zi}i=1 is a sequence of open subsets in ℱH−1 that is defined by Wε,n for fixed designs or by Uε for random designs, and ∏n is the posterior distribution of f0 given {yi}i=1n, then

n(fZnC|y1,,yn)0almost surely[Pf0,σ0].

Further, for all H odd if M < 0, this relation holds for f0 ∈ ℱH+, otherwise it holds for f0 ∈ ℱH−. Similarly, for H even if M > 0, then f0 ∈ ℱH+, otherwise it holds for f0 ∈ ℱH−.

The proof of this consistency result follows from Choi & Schervish (2007) and the prior positivity result above. The condition on the prior over σ2 can be satisfied with an inverse-Gamma distribution.

2·4. Bayes Factors for Testing Curve Shapes

Our approach allows one to define the shape of the curve through the α vector and to place prior probability on a class of functions having a given shape, i.e the number and type of extrema in X. When there are flat regions of f0 the shape of the curve is not uniquely identifiable based upon the configuration of α, and hypothesis tests may be inconclusive. For an example of this, see the consistency arguments for monotone curve testing in Scott et al. (2015). In what follows, we assume that |f0(x)|>0 at all points in X except within flat regions.

Let ℍ1 and ℍ2 denote two distinct and non-nested sets of α values, corresponding to distinct shapes. These sets are defined by the number of αhX, the number of αhinf(X), and the number of αhsup(X). One can compute pr(Y|f0 ∈ ℍ1) and pr(Y|f0 ∈ ℍ2), with the corresponding Bayes factor between the two shapes being

BF12=pr(Y|f01)pr(Y|f02). (5)

This quantity is not available analytically, but can be estimated through posterior simulation by monitoring the α and β vectors.

Any two shapes falling within ℱH can be compared using this approach. Alternatively, one may be interested in the hypothesis that f0 is in a class of functions with at least K extrema. For example, one may wish to assess whether or not the function is monotone. In this case, one can define ℍ1 to correspond to functions in ℱH with F or more extrema and 2=1c to functions with less than F extrema. The value of H can be elicited as an upper bound on the number of extrema to avoid highly irregular functions. For such tests, the following result holds.

Proposition 2

Let ℍ1 be the class of functions in ℱH with F or more extrema and 2=1cFH. If f0 ∈ ℍ1, then

BF12

as n → ∞

This result, an application of Theorem 1 in Walker et al. (2004), It follows from the fact that local extremum spline representations having fewer than F change points can never be arbitrarily close to the function of interest.

3. Posterior Computation

We rely on Godsill (2001) to develop a reversible jump Markov chain Monte Carlo algorithm to sample between models. Consider moves between models ℳ and ℳ′, where the model ℳ′ has one extra knot that is a child of a node also in ℳ. As described further in the Supplementary Material, most of the local extremum spline basis functions for model ℳ and ℳ′ are identical, with only j + 2 different functions. Let β−ℳ denote the coefficients on all the splines that are the same as well as σ2, π and λ, which are parameters shared between both models. The remaining spline coefficients are β and βM for models ℳ and ℳ′, respectively. As in Godsill, given the shared vector β−ℳ, we marginalize β and βMout of the posterior to compute p(ℳ′ | Y, β) and p(ℳ | Y, β). This marginalization requires numerical integration of multivariate normal distributions, which is performed using Genz (1992) and Genz & Kwong (2000). The probability of a move between two models is determined by the ratio

h=q(M;M)p(M|Y,βM)q(M;M)p(M|Y,βM), (6)

where a knot insertion is made with probability min(1, h), a knot deletion is made with probability min(1, 1/h), and q(ℳ; ℳ′) is the transition probability between ℳ and ℳ′.

All proposals are made between models that are nested and differ by only one knot. When the current model has no children we propose a knot insertion with unit probability. Otherwise, the proposal adds or deletes a knot with probability 1/2, and the inserted or deleted knot is chosen uniformly. For a knot insertion, as we are going from model ℳ to ℳ′, the available knots are represented by all failures in the branching process that generated ℳ. A knot deletion going from model ℳ′ to ℳ represents all of the nodes in the branching process that generated ℳ′ that do not have any children. All other parameters, including the spline coefficients, are sampled in Gibbs steps described in the supplement.

The posterior distribution is often multimodal, with the sampler getting stuck in a single mode, when widely different parameter values have relatively large support by the data, with low posterior density between these isolate modes. To increase the probability of jumps between modes, a parallel tempering algorithm (Geyer, 1991, 2011) is implemented.

4. Simulation

4·1. Simulation Specification

We investigate our approach through simulations for functions having 0, 1, or 2 local extrema interior to X. For all simulations, we place a Ga(1, 1) prior over σ. For the hyper prior on π, we let ν = 2 and ω = 18, which puts low prior probability on flat curves. Additionally, for the hyper prior over λ, we let δ = 0·2 and κ = 2, which favors smaller values of β. All local extremum splines were constructed using B-splines of order 2 with M = 100.

The Markov chain Monte Carlo algorithm was implemented in the R programming language with some subroutines written in C++ and is available from the first author. Depending on the complexity of the function, the algorithm took between 60 and 90 seconds per 50, 000 samples using one core of a 3·3 gigahertz Intel i7-5830k processor. Parallelizing the tempering algorithm on multiple cores may substantially reduce the computation time. Additional information on the convergence of the algorithm, as well as impact of the B-spline order used, is provided in the Supplementary Material.

4·2. Curve Fitting

We compare the local extremum spline approach to other nonparametric methods, including Bayesian P-splines (Lang & Brezger, 2004), a smoothing spline method described in Green & Silverman (1993), and a frequentist Gaussian process approach described in Chapter 5 of Shi & Choi (2011). We consider seven different curves with between 0 and 2 extrema and compare the fits of the other approaches and of a local extremum spline specified to have at most H = 2 change points. The following true curves are investigated:

f1(x)=10x2,f2(x)=2+20Φ{(x05)/0071},f3(x)=5cos(πx),f4(x)=10(x05)2,f5(x)=25+10exp{50(x035)2},f6(x)=1+25sin{2π(x+8)}+10x,f7(x)=5min(2πx)/(x+075)325(x+105). 

We set yi = fj(xi) + εi with εi ~ N(0, σ2). Functions f1, f2 and f3 are monotone, f4 and f5 have one change point, and f6 and f7 have two change points. For each simulation, a total of 100 equidistant points were sampled in X=[0,1]. We consider σ2 = 1, 4. For each simulation condition, 250 data sets were generated, fitted and compared using the mean squared error, n1i=1n{f^(xi)f(xi)}2, for the local extremum spline, smoothing spline, Bayesian P-spline, and Gaussian process approaches.

For the local extrema approach, we collected 50,000 Markov chain Monte Carlo samples, with the first 10, 000 samples disregarded as burn-in. For the parallel tempering algorithm, we specify 12 parallel chains with {κ1, …, κ12} = {1/30, 1/24, 1/12, 1/9, 1/5, 1/3·5, 1/2, 1/1·7, 1/1·3, 1/1·2, 1/1·1, 1}, and monitor the target chain with κ12 = 1. The P-spline approach was defined using 30 equally-spaced knots, and the prior over the second-order random walk smoothing parameter was IG(1, 0·0005), distribution, which was one of the recommended choices in Lang & Brezger (2004).·In this approach, 25,000 posterior samples were taken, discarding the first 5,000 as burn in. For the smoothing spline method, the R function ‘smooth.spline’ was used. Finally, the Gaussian process approach used a frequentist implementation given in the R package ‘GPFDA.’

Table 1 gives the integrated mean squared error of the various approaches. All numbers marked with an asterisk are significantly different from local extremum splines. The local extremum approach integrated mean square error is always smaller than the others, and in most cases it is significantly different at the 0·05 level. Generally, when there is a high signal-to-noise ratio, the methods perform similarly, but when the ratio decreases, specifically in flat regions, the local extremum approach was superior as it removed artifactual bumps from the estimate.

Table 1.

Estimated mean squared error for all functions. For each function, the left value represents the simulation condition σ2 = 1 and the right value represents the simulation condition σ2 = 4. Asterisks signify that the number is significantly different than the local extremum spline at the one-sided 0·05 level.

True Function Local Extremum Splines Smoothing Splines Bayesian P-Splines Gaussian Process
f1 1·60/0·49 2·11*/0·58 2·28*/0·55 2·15*/0·71*
f2 2·59/0·09 4·19*/0·13* 3·82*/0·11* 5·26*/0·15*
f3 1·57/0·49 2·43*/0·67* 2·26*/0·92* 2·64*/0·79*
f4 1·70/0·49 2·10*/0·56* 2·15*/0·49 1·90*/ 0·59*
f5 2·55/0·61 3·69*/1·12* 3·39*/0·98* 3·90*/1·14*
f6 2·17/0·69 2·57/0·72 5·16*/0·72 2·44/0·79*
f7 2·38/0·66 3·39*/1·05* 3·96*/0·85* 3·30*/0·90*

4·3. Hypothesis Testing

We perform a simulation experiment investigating the method’s ability to correctly identify the shape of the response function for three sets of hypotheses. In the first case, the null hypothesis is the set of all functions with one or more extremum, and the alternative, ℍ1, is the set of all monotone functions. In the second test, the null consists of all monotone functions, and the alternative, ℍ2, is all functions with one or more extremum. Finally, for the third test the null hypothesis is the set of functions having at most one extrema, and the alternative, ℍ3, is the set of functions with two extrema first having a local maximum followed by a local minimum. Functions are defined on X=[0,1]. The nine functions used in this simulation are:

 1g1(x)=2+05x+Φ{(x05)/0071},g2(x)=05sin{2π(x+8)}+475x,g3(x)=1+225x, 
 2g4(x)=4(x075)2,g5(x)=1+2x156exp{50(x05)2},g6(x)=15(x05)31(x<05)+03(x05)exp{250(x025)}, 
 3g7(x)=0.85sin{2π(x+8)}+475x,g8(x)=g5(x),g9(x)=5sin(2πx)/(x+075)325(x+105)+2. 

For the simulation, data are generated assuming yi = gj(xi) + εi, where εi ~ N(0, σ2) and σ2 = 1. We consider sample sizes n = 100, 200, 300, and 400, with 50 data sets constructed where points are sampled evenly across X, for each sample condition. The local extremum approach is as above except, but 150,000 posterior samples are taken with the first 10,000 disregarded as burn-in. For tests ℍ1 and ℍ2, the local extremum approach is compared with the Bayesian method of Salomond (2014) and the frequentist methods of Baraud et al. (2005) and Wang & Meyer (2011). For the method of Baraud et al. we use the test where n = 25, and for the method of Wang and Meyer we use k = 4 splines, which were the most powerful tests presented in the respective articles.

The Bayesian tests produce Bayes factors, while the frequentist tests have corresponding test statistics. We compare the methods based upon area under the receiver operating curve. For the simulation, the false positive rate was computed from the values of the test statistics for the other functions not in the test set. As a frequentist calibration of our Bayesian test, one can choose a threshold on the Bayes factor to control the type I error rate at a specified level based on an approximation to the distribution of the Bayes factor under the null hypothesis. We describe this approximation in the Supplementary Material.

Figure 1 shows the receiver operating curve for hypothesis ℍ1. This shows that the local extremum approach is superior to the other three approaches across all false positive rates. Further, the estimated area under the receiver operating curve is 0·94, better than the approaches of Salomond at 0·86, Baraud at 0·77, and Wang and Meyer at·0·74. When looking at the impact of sample size on the tests, the power of the local extrema approach increases as the sample size increases, does so at a rate greater than competitors, and is similarly superior for hypothesis ℍ2, data not shown.

Fig. 1.

Fig. 1

The receiver operating curve for the four tests defined for hypothesis ℍ1 for all 1,400 simulations. The black line represents the local extremum spline, dashed line the approach of Salomond, dashed-dotted line the approach of Baraud, and dotted line the approach of Wang and Meyer.

For hypothesis ℍ3, there is not an equivalent methodology in the literature, but the performance of our approach is excellent. The area under the receiver operator curve is 0·94. For the Bayes factor cut point of 6, Table 2 gives results across all simulation conditions.·Our test achieves high power for function g7, even though it differs this function is only slightly different from g3. Function g8 is the same as g5, this simulation gives evidence that the departure from monotonicity may be due to the pronounced U shape in the data and not necessarily because there are two extrema, which requires more data to conclude in favor of ℍ3.

Table 2.

Percent of samples where the model was correctly chosen as having two extrema, which is hypothesis ℍ3, using a cut point of 6.

Function n
100 200 300 400
g7 78 90 98 96
g8 14 32 22 46
g9 76 88 98 100

4·4. Seasonal Influenza and Pneumonia Death Rate

In temperate climates, the prevalence of influenza peaks in the winter months while dropping in the warmer months. Estimating this seasonal effect as well as departures from this effect, may be of interest when estimating the magnitude of an influenza epidemic. Here, we expect a peak in the winter months followed by a trough in the summer months. Parametric models for this pattern may not be adequate to model the observed phenomena, and smoothing approaches do not guarantee this pattern. We use local extremum splines, setting H = 2, to estimate this trend for Virginia, North Carolina and South Carolina for data collected by the Centers for Disease Control and Prevention National Center for Health Statistics Mortality surveillance branch.

Figure 2 plots the estimated mortality rates, estimated using an additive model defined by a quadratic trend representing a decrease in mortality over time, a seasonal component defined using local extremum spline, and a P-spline that represents departures from the overall trend. This seasonal component is different from the trend published by the Centers for Disease Control (Viboud et al., 2010), mainly due to the asymmetry in the local extrema approach during the winter months, which cannot be captured by a single sinusoidal function.

Fig. 2.

Fig. 2

Estimate of the expected rate of seasonal influenza and pnuemonia deaths using the local extremum spline, black line, compared to the observed rate of influenza and pnuemonia deaths estimated using the Center for Disease Control’s standard approach, gray line. Dots represent observed state level influenza and pneumonia percentages.

Supplementary Material

Supplemental

Acknowledgments

The authors would like to thank the referees and associate editor for comments on earlier versions of this manuscript. This research was partially supported by a grant from the National Institute of Environmental Health Sciences of the United States National Institutes of Health.

Appendix 1

Proofs of results

Proof of Proposition 1

It is well known that k=1K+j1βkB(j,k)(x) is continuous for j ≥ 1 and for j ≥ 1 and for all xX. Further, h=1H(xαh) is a polynomial; therefore, h=1H(xαh)k=1K+j1βkB(j,k)(x) is continuous with anti-derivative k=0K+j1βkB(j,k)(x).

If βk ≥ 0 for all k ≥ 1, then k=0K+j1βkB(j,k)(x)0 for all xX and f(x)=h=1H(xαh)k=1K+j1βkB(j,k)(x) can only change sign when x = αh. Thus, there are at most H local extrema interior to X, with fFH. □

Proof of Theorem 1

Consider f0 ∈ ℱH, where f0 has exactly H change-points. Functions with less than H change points can be modeled by removing the required change point parameters from X and continuing with the proof below.

Let fBS be a taut B-spline approximation of f0 of order j + 1 defined on the knot set T

having exactly H extrema such that

f0fBS<ΔC.

Here fBS is defined on T, where Δ = maxkk − τk+j| < 1. As f0 and fBS are continuous and differentiable, we define C such that ‖f0‖ < C < ∞ and ‖fBS‖ < C. The measurable set of taut spline functions LfBS={fBS:f0fBS<ΔC} can be shown to exist (de Boor, 2001) and we define a map G:LfBSLfLX where LfLX a subset of all possible local extremum spline functions with H change points. Consider

fBSfLX=supxX|fBS(x)fLX(x)| (A1)

and let β0 = fBS(0). For the exactly H extrema α1BS<ldots<αHBS in fBS defined by the taut spline, set αh=αhBS. Additionally, if fBS(α1BS)fBS(0)0 with H odd, then set M = −1; otherwise set M = 1. In the case where fBS(α1BS)fBS(0)<0 with H odd, then set M = 1 otherwise set M = −1.

Rewriting the right hand side of (A1) in a form based upon the derivative, we have

supxX|xk=1K+j1κkB(j,k)(ξ)βkG(ξ)B(j,k)(ξ)dξ|,
k=1K+j1supxX|τkxκkB(j,k)(ξ)βkG(ξ)B(j,k)(ξ)dξ|,

where the derivative of fBX is based upon the derivative formula for B-Splines (de Boor, 2001) and G(ξ)=h=1H(ξαh).

Because of the taut spline construction of fBS, we know that for all k, h such that αh ∉ [τk, τk+j−1] one has sgn(κk) = sgn(G(x)), for all x ∈ [τk, τk+j−1]. Here sgn(·) is the signum function. On each of these intervals let

βk=τkτk+j1κkB(j,k)(ξ)dξτkτk+j1G(ξ)B(j,k)(ξ)dξ.

As B(j,k)(x) ≥ 0, we have βk ≥ 0; further, one has

τkτk+j1κkB(j,k)(ξ)βkG(ξ)B(j,k)(ξ)dξ=0

for all intervals such that αh [τk, τk+j−1].

For the at most H coefficients defined on splines that are nonzero in the intervals αh [τk, τk+j−1], set these coefficients to zero. As there are a finite number of intervals whose error is non-zero and fBS is bounded, the maximum error is at most (H + 1)(j + 1)ΔC for any x and

fBSfLX(H+1)(j+1)ΔC.

Consequently, for any ε, consider taut B-spline constructions on knot sets T such that Δ ≤ ε[{2(H + 1)(j + 1)}C]−1 that also have ‖f0fBS < ε/2. Then one has

f0fLXf0fBS+fBSfLX=ε2+ε2,

completing the proof. □

Proof of Lemma 1

The function G in Theorem 1 is measurable. If LfBS is measurable on some abstract measure space, one has pr(fLXf0<ε|TM)>0 for any ε > 0 and some TM. Given the prior puts probability over knot sets having knot spacings that are arbitrarily close, that is Δ ≤ ε[{2(H + 1)(j + 1)}C]1 as in Theorem 1, we conclude that pr(f0fLX<ε)=pr(fLXf0<ε|TM)pr(TM)>0 for all ε > 0. □

Proof of Theorem 2

We verify the conditions given in A1 and A2 of Theorem 1 of Choi & Schervish (2007). If there is positive prior probability, Lemma 1, within all neighborhoods of (f0, σ2), one can use Choi & Schervish (2007), section 4, to show that the conditions of A1 of Theorem 1 are met. To verify A2 we have that ℱH+ and ℱH+ are subsets of all continuous differentiable functions on X which were considered in Choi & Schervish (2007); consequently, we appeal to Theorem 2 and 3 of Choi & Schervish (2007) to construct suitable tests for both random and fixed designs using Wε,n and Uε. We need only verify (iii) in part A2.

As in Choi & Schervish (2007), assume that Mn=O(nα) with 1/2 < α < 1. We show that pr(fLX(x)<Mn)C0exp(nC1) and pr(fLX(x)>Mn)C2exp(nC3) for some C0, C1, C2, C3 > 0. Define B(j,k,M,α)(X) as the design matrix given model ℳ and a particular α figuration. Let A=supM,k,α,x|B(j,k,M,α)(X)| and K be the number of spline coefficients in model ℳ then

pr{fLX(x)>Mn}=pr{k=1KMβkB(j,k,M,α)(X)>Mn|M}dαdMdπdλpr{k=1KMβkB(j,k,M,α)(X)>Mn|M,β>0}dαdMdπdλpr{k=1KMβkA>Mn|M,β>0}dαdMdπdλexp(Mnt)k=1M{(λπtλt)KMpr(M)}dαdπdλ. (A2)

Where the last inequality comes from the the Chernoff bounds.

Now let pr*(ℳ) be the probability of a branching process where ζ < 0.5 is constant for all children, then there exists a K such that {pr*(ℳ)}2 ≥ pr (ℳ) for all ℳ such that KMK. Partition the sum into the finite sum where KM<K and the infinite sum KMK. As the finite sum is finite for all 0 < t < λ, continuing with (A2):

exp(Mnt)C1+[KMK(λπtλt)KM{pr(M)}2]dαdπdλexp(Mnt)C1+C2[KMK(λπtλtζ)KMpr(M)]dαdπdλexp(Mnt)(C1+C2)dαdπdλ

where the last inequality exists as λ is bounded above zero, which implies one can choose some t < λ such that (λ − πt)/(λ − t)ζ < 1. This implies that

pr{fLX(x)>Mn}C0exp(nC1).

A derivation similar to the above can be used to show the same holds for pr(‖fLX(x)‖ > Mn) ≤ C2 exp(−nC3). One can find a B=supM,k,α,x|B(j,k,M,α)(X)| and substitute B for A and B(j,k,M,α)(X) for B(j,k,M,α)(X) in the above derivation.

Contributor Information

M. W. WHEELER, National Institute for Occupational Safety and Health, 1150 Tusculum Avenue, Cincinnati, Ohio 45226, MS C-15

D. B. DUNSON, Department of Statistical Science, Duke University, Box 90251, Durham, NC 27708

A. H. HERRING, Department of Statistical Science, Duke University, Box 90251, Durham, NC 27708

References

  1. Baraud Y, Huet S, Laurent B. Testing convex hypotheses on the mean of a Gaussian vector. application to testing qualitative hypotheses on a regression function. Annals of Statistics. 2005:214–257. [Google Scholar]
  2. Biller C. Adaptive Bayesian regression splines in semiparametric generalized linear models. Journal of Computational and Graphical Statistics. 2000;9:122–140. [Google Scholar]
  3. Bornkamp B, Ickstadt K. Bayesian nonparametric estimation of continuous monotone functions with applications to dose–response analysis. Biometrics. 2009;65:198–205. doi: 10.1111/j.1541-0420.2008.01060.x. [DOI] [PubMed] [Google Scholar]
  4. Choi T, Schervish MJ. On posterior consistency in nonparametric regression problems. Journal of Multivariate Analysis. 2007;98:1969–1987. [Google Scholar]
  5. de Boor C. A Practical Guide to Splines. New York: Springer Verlag; 2001. [Google Scholar]
  6. DiMatteo I, Genovese CR, Kass RE. Bayesian curve-fitting with free-knot splines. Biometrika. 2001;88:1055–1071. [Google Scholar]
  7. Feller W. Introduction to Probability Theory and Its Applications. I. New York: John Wiley and Sons; 1974. [Google Scholar]
  8. Genz A. Numerical computation of multivariate normal probabilities. Journal of Computational and Graphical Statistics. 1992;1:141–149. [Google Scholar]
  9. Genz A, Kwong KS. Numerical evaluation of singular multivariate normal distributions. Journal of Statistical Computation and Simulation. 2000;68:1–21. [Google Scholar]
  10. Geyer CJ. Markov chain Monte Carlo maximum likelihood. In: Keramidas EM, editor. Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface. Red Hook, NY: Interface Foundation of North America; 1991. pp. 1–8. [Google Scholar]
  11. Geyer CJ. Importance sampling, simulated tempering and umbrella sampling. In: Brooks S, Gelman A, Jones G, Meng X, editors. Handbook of Markov Chain Monte Carlo. Boca Raton: Chapman & Hall/CRC; 2011. pp. 295–311. [Google Scholar]
  12. Godsill SJ. On the relationship between Markov chain Monte Carlo methods for model uncertainty. Journal of Computational and Graphical Statistics. 2001;10:230–248. [Google Scholar]
  13. Green PJ. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika. 1995;82:711–732. [Google Scholar]
  14. Green PJ, Silverman BW. Nonparametric Regression and Generalized Linear Models: a Roughness Penalty Approach. Boca Raton: CRC Press; 1993. [Google Scholar]
  15. Hans C, Dunson D. Bayesian inferences on umbrella orderings. Biometrics. 2005;61:1018–1026. doi: 10.1111/j.1541-0420.2005.00373.x. [DOI] [PubMed] [Google Scholar]
  16. Holmes C, Mallick B. Generalized nonlinear modeling with multivariate free-knot regression splines. Journal of the American Statistical Association. 2003;98:352–368. [Google Scholar]
  17. Lang S, Brezger A. Bayesian P-splines. Journal of Computational and Graphical Statistics. 2004;13:183–212. [Google Scholar]
  18. Lavine M, Mockus A. A nonparametric Bayes method for isotonic regression. Journal of Statistical Planning and Inference. 1995;46:235–248. [Google Scholar]
  19. Meyer M. Inference using shape-restricted regression splines. The Annals of Applied Statistics. 2008;2:1013–1033. [Google Scholar]
  20. Meyer MC, Hackstadt AJ, Hoeting JA. Bayesian estimation and inference for generalised partial linear models using shape-restricted splines. Journal of Nonparametric Statistics. 2011;23:867–884. [Google Scholar]
  21. Neelon B, Dunson D. Bayesian isotonic regression and trend analysis. Biometrics. 2004;60:398–406. doi: 10.1111/j.0006-341X.2004.00184.x. [DOI] [PubMed] [Google Scholar]
  22. Ramgopal P, Laud P, Smith A. Nonparametric Bayesian bioassay with prior constraints on the shape of the potency curve. Biometrika. 1993;80:489–498. [Google Scholar]
  23. Ramsay J. Monotone regression splines in action. Statistical Science. 1988;3:425–441. [Google Scholar]
  24. Salomond JB. The Contribution of Young Researchers to Bayesian Statistics. New York: Springer; 2014. Adaptive Bayes test for monotonicity; pp. 29–33. [Google Scholar]
  25. Scott JG, Shively TS, Walker SG. Nonparametric Bayesian testing for monotonicity. Biometrika. 2015;102:617–630. [Google Scholar]
  26. Shi JQ, Choi T. Gaussian Process Regression Analysis for Functional Data. Boca Raton: CRC Press; 2011. [Google Scholar]
  27. Shively T, Sager T, Walker S. A Bayesian approach to non-parametric monotone function estimation. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2009;71:159–175. [Google Scholar]
  28. Shively T, Walker S, Damien P. Nonparametric function estimation subject to monotonicity, convexity and other shape constraints. Journal of Econometrics. 2011;161:166–181. [Google Scholar]
  29. Viboud C, Miller M, Olson DR, Osterholm M, Simonsen L. Preliminary estimates of mortality and years of life lost associated with the 2009 A/H1N1 pandemic in the US and comparison with past influenza seasons. Public Library of Science: Currents Influenza. 2010;2:1–7. doi: 10.1371/currents.RRN1153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Walker S, Damien P, Lenk P. On priors with a Kullback–Leibler property. Journal of the American Statistical Association. 2004;99:404–408. [Google Scholar]
  31. Wang JC, Meyer MC. Testing the monotonicity or convexity of a function using regression splines. Canadian Journal of Statistics. 2011;39:89–107. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental

RESOURCES