Skip to main content
PLOS One logoLink to PLOS One
. 2022 Jun 30;17(6):e0270310. doi: 10.1371/journal.pone.0270310

Bayesian model averaging for nonparametric discontinuity design

Max Hinne 1,*, David Leeftink 1, Marcel A J van Gerven 1, Luca Ambrogioni 1
Editor: Lalit Chandra Saikia2
PMCID: PMC9246148  PMID: 35771833

Abstract

Quasi-experimental research designs, such as regression discontinuity and interrupted time series, allow for causal inference in the absence of a randomized controlled trial, at the cost of additional assumptions. In this paper, we provide a framework for discontinuity-based designs using Bayesian model averaging and Gaussian process regression, which we refer to as ‘Bayesian nonparametric discontinuity design’, or BNDD for short. BNDD addresses the two major shortcomings in most implementations of such designs: overconfidence due to implicit conditioning on the alleged effect, and model misspecification due to reliance on overly simplistic regression models. With the appropriate Gaussian process covariance function, our approach can detect discontinuities of any order, and in spectral features. We demonstrate the usage of BNDD in simulations, and apply the framework to determine the effect of running for political positions on longevity, of the effect of an alleged historical phantom border in the Netherlands on Dutch voting behaviour, and of Kundalini Yoga meditation on heart rate.

Introduction

The bread and butter of scientific research is the randomized-controlled trial (RCT) [1]. In this design, the sample population is randomly divided into two groups; one that is manipulated (e.g. a drug is administered or a treatment is performed), while the other is left unchanged. RCT allows one to perform causal inference, and learn about the causal effect of the intervention [2, 3]. However, in practice there may be several insurmountable ethical or pragmatic hurdles that deter one from using RCT, such as ethical or pragmatic concerns. Luckily, all is not lost for experimental design. There exist several quasi-experimental designs (QEDs) that replace random assignment with deterministic assignment, which still allow for causal inferences, but at the cost of additional assumptions [4]. Prominent examples are regression discontinuity (RD) and interrupted time series (ITS) designs, that assign a sample to one of the two groups based on it passing a threshold on an assignment variable [57]. The idea behind these approaches is that, around the assignment threshold, observations are distributed essentially randomly, so that locally the conditions of RCT are recreated [8, 9]. The methodological pipeline of quasi-experimental designs like these generally consists of three steps [10]. First, a regression (typically linear) is fit to each of the two groups individually. Next, the regressions are extrapolated to the threshold (RD), or to the entire post-intervention range (ITS). Finally, the difference between the extrapolations of the two groups is taken as the effect size of the intervention. A straightforward statistical test can be applied to check whether the effect is present.

Here, we provide a novel framework for such approaches, which we call ‘Bayesian nonparametric discontinuity design’ (BNDD). The main innovations of BNDD are: First, we frame the problem of detecting an effect as Bayesian model comparison. Instead of comparing the pre- and post-intervention regressions, we introduce a continuous model and a discontinuous model. In the discontinuous model, observations before and after the intervention are assumed to be independent, while in the continuous model this assumption is lifted. We quantify the evidence in favor of either model, rather than only for the alternative model, via Bayesian model comparison [11]. This enables the computation of the marginal effect size via Bayesian model averaging, which provides a more nuanced estimate compared to implicitly conditioning on the alternative model [12, 13]. Furthermore, the model comparison approach automatically penalizes the discontinuous model for its additional flexibility [14]. Second, we use Gaussian process (GP) regression to avoid strong parametric assumptions. The result is a flexible model that can capture nonlinear interactions between the predictor and outcome variables. Traditional assumptions, such as linearity, can still be implemented in our model by using the appropriate covariance function. At the same time, much more expressive covariance functions can be used, such as the spectral mixture kernel [15], that better capture long-range correlations, and lead to more accurate inference. Lastly, in most discontinuity-based methods for quasi-experimental design, a bandwidth parameter determines the trade-off between estimation reliability and the local randomness assumptions that are needed to draw causal inferences [16]. In BNDD, all observations are used to estimate both the continuous and discontinuous model, but by optimizing the length-scale parameter of the GP covariance functions we control the sensitivity to different types of discontinuities and adherence to locality assumptions.

Related work

While quasi-experimental designs have been around since the 1960s [5, 17], recently there has been a renewed interest in this class of methods [9, 18], in particular in epidemiology [19] and education [20]. Researchers from different domains are promoting the use of QED [2123], which has prompted several extensions of classical QEDs. For instance, several authors have proposed to use Bayesian models for QED [16, 24]. By assuming a prior distribution for alleged effect size and using Bayes’ theorem, these studies provide an explicit descriptions of the estimation uncertainty. In contrast to our work, these methods focus on the estimation of the treatment effect instead of model comparison, and typically assume restrictive parametric forms. Other studies have considered nonparametric alternatives to linear models. For example, [25] use locally linear nonparametric regression. Alternatively, one can use kernel methods that compute a smoothly weighted average of the data points to create an interpolated regression that does not depend on a specific parametric form [20]. Other studies have considerd using Gaussian process regression for regression discontinuity as well [10, 26]. Here, instead of fitting a parametric form such as linear regression, the regression is modelled by a GP, which results in a flexible, nonparametric model and more accurate effect size estimates compared to when using linear regression. BNDD uses GP regression as well, but whereas [10, 26] focus on the inference of the magnitude of the treatment effect, we first determine whether an effect is present at all using Bayesian model comparison [11], and we use Bayesian model averaging [12] to reduce the overconfidence that follows from conditioning on the alternative model or particular covariance functions. Consequently, BNDD is less prone to false positives, is able to detect discontinuities in derivatives of the latent function rather than in the function per sé, and using the spectral mixture kernel our approach is well-suited for detecting changes in time series, which is crucial in ITS design.

Discontinuity-based causal inference

We provide a brief introduction of the background of causal inference using RD and ITS designs, for a more in-depth discussion, we refer to e.g. [7, 27]. The detection of a causal effect is naturally formulated using the potential outcomes framework [28], which assumes that for each individual in the study both the outcome of the treatment and its alternative can potentially be observed.

Consider an observation i (with or without temporal ordering) with independent variable xiRp and response yiRq (we will assume p = q = 1, but multidimensional extensions are straightforward). In addition, we observe an indicator variable zi, where zi = 1 denotes the intervention of interest has been applied to case i, and zi = 0 indicates it has not. The outcome depends on treatment, so

yi={yi(0)ifzi=0,yi(1)ifzi=1. (1)

The individual causal effect is defined as the difference between these two potential outcomes, that is di = yi(1) − yi(0). Since we only ever observe one outcome, the individual causal effect is out of reach, so in RD design we focus on the average causal effect (ACE) instead, defined by the differences in the expectations:

dACE=E[y(1)]-E[y(0)]. (2)

In the randomized controlled trial, the assignment of treatment zi is random, so that all differences other than due to the treatment are integrated out in these expectations [20]. In QED designs such as RD and ITS however, the allocation to intervention or control group is based on a threshold x0 [29]:

zi={1ifxix0and0otherwise. (3)

This changes how the ACE is computed, which for RD design becomes [8, 30]:

dRD=E[yi(1)-yi(0)xi=x0]=limxx0E[yixi=x0]-limxx0E[yixi=x0], (4)

provided the distributions of yi given xi are continuous in x, and the conditional expectations E[yi(1)|xi] and E[yi(0)|xi] exist.

For interrupted time series, there are no post-intervention control observations, as all post-threshold observations xix0 are in the intervention group. Here, the causal estimand becomes the average effect of the treatment on the treated (ATT) [31]:

dITS(xi)=E[yi(1)-yi(0)xix0]=E[yixi,D]-E[yixi,D0], (5)

for xix0, D={(xi,yi)}i=1n, and D0={(xi,yi)}xi<x0. Intuitively, this measure of effect size is the difference between the extrapolation based on the pre-intervention data, and the actual post-intervention observations. Due the the reliance on extrapolation, it is crucial that correct assumptions are made on the functional form. For example, assuming linearity will lead to a biased ATT estimate if this does not describe the functional form well.

Importantly, for both approaches we assume there are no confounding variables that affect the relationship between x and y (for a more in-depth discussion of RD design, see e.g. [16]).

Bayesian nonparametric discontinuity design

In standard RD and ITS analyses, causal conclusions are drawn by estimating the effect d and testing whether this differs from zero. Instead, we perform Bayesian model comparison to see whether the data are better supported by the alternative model M1, that claims an effect is present, than by the null model M0, in which such an effect is absent. The result of the model comparison is quantified by the Bayes factor [32]:

BF10=p(DM1)p(DM0). (6)

Here, p(DM1) and p(DM0) are the marginal likelihoods of the two models with their respective parameters integrated out. The Bayes factor indicates how much more likely the data are given the discontinuous model, compared to the continuous model [33]. Unlike a p-value, it can provide evidence for either model, so that it is possible to find evidence supporting the absence of a discontinuity [11, 34]. Furthermore, this model comparison approach automatically accounts for model complexity [14].

In the null model, all probability mass of p(dD,M0) is concentrated at d = 0, while for the alternative model we have an effect size distribution p(dD,M1). Existing regression discontinuity methods focus on inference of d, and hence implicitly condition on M1. This approach ignores the uncertainty in the model posterior

p(MD)=p(DM)p(M)ip(DMi)p(Mi), (7)

where p(Mi) is the prior probability of model i. Ignoring the uncertainty in this distribution results in an overconfident overestimate of the effect size, and consequently of too optimistic conclusions of the efficacy of an intervention. This uncertainty can be accounted for via the Bayesian model average (BMA) estimate of d:

p(dD)=j=0,1p(dD,Mj)p(MjD). (8)

The resulting distribution integrates over the uncertainty of the model, which has been shown to lead to optimal predictive performance [12]. Since the effect size is by definition zero according to M0, Eq (8) is a spike-and-slab distribution that combines a spike at d = 0 with a Gaussian distribution determined by M1, where each component is weighted by the posterior probability of the corresponding model. Compared to the overconfident estimation of d conditioned only on M1, this has a regularizing effect [35], shrinking small effect size estimates towards zero. Note that for now, we assume a uniform prior over the models, such that p(M0)=p(M1)=1/2, but this may be changed, for instance to account for multiple comparisons [36]. We proceed to explain the distributions implied by the two models in more detail.

The continuous model

The continuous (null) model M0 implies that the regression does not depend on the threshold, which leaves us with a single regression for all data points. We assume Gaussian observation noise:

yiN(f0(xi),σn2).

Here, σn2 is the observation noise variance, and f0(xi) captures the relationship between the predictor and the response. We do not impose a parametric form on f0, and instead assume f0 follows a Gaussian process (GP) [37]:

f0M0GP(μ(x;θ0),k(x,x;θ0)),

with mean function μ(x;θ0) and covariance function k(x, x′;θ0)). We omit the dependence on the hyperparameters θ when confusion is unlikely to arise.

The discontinuous model

In the alternative model we assume the latent processes before and after x0 are independent. We write

f1M1GP(μ(x;θ1),k1(x,x;θ1), (9)

where k1(x, x′;θ1) = k(x, x′;θ1) if x and x′ are on the same side of x0, and k1(x, x′;θ1) = 0 otherwise. As a result, the Gram matrix with elements Kij = k1(xi, xj;θ1) is block-diagonal:

K=[A00B], (10)

with the elements in the matrices A and B corresponding to the covariances between observations at the same side of the threshold x0. For computational efficiency, the inverse of K can be computed by the separate inverses of these smaller sub-matrices.

Regression discontinuity effect size

Since f1 is continuous everywhere except at x0, we can determine the effect size given M1 by taking the difference of its limits as in Eq (4). The result is a Gaussian distribution:

p(dD,M1)=N(m,s2), (11)

with

m=limxx0f1(x)-limxx0f1(x) (12)

and

s2=limxx0V[f1(x)]+limxx0V[f1(x)]=2σn2, (13)

for stationary covariance functions, where σn2θ1 represents the observation noise hyperparameter of the discontinuous model.

Interrupted time series effect size

In contrast to RD design, in ITS the discontinuity may induce a nonstationarity in the latent process, such as a change in length-scale or frequency. To address this, we allow the hyperparameters pre- and post-intervention to differ, i.e. Aij=k(xi,xj;θ1A) and Bij=k(xi,xj;θ1B). The differences in design also imply a different notion of effect size, which is now a function of x:

p(d(x)D,M1)=N(m(x),sn2), (14)

with m(x)=f1(x;θ1B)-f1(x;θ1A) and sn2=(σnA)2+(σnB)2. Note that sn2 does not depend on x. In practice, we summarize this dynamic effect by its maximum.

Of particular interest in ITS are covariance functions that capture long-range correlations, because these have the potential to extrapolate better and hence provide more accurate effect size estimates. The spectral mixture kernel was designed for this purpose [15]. It is defined as a mixture of Gaussian components in the frequency domain:

S(ω)=q=1Qwq1σq2πexp[-12(ω-μqσq)2], (15)

where μq and σq2 are the mean and variance of each component, respectively. This spectral representation is then transformed into a regular stationary covariance function using the inverse Fourier transform [38], which results in

k(τ)=q=1Qwqcos(2πτμq)exp(-2π2τ2σq2), (16)

with τ = |xx′|. The hyperparameters θ = (Q, μ, σ, w) have the following meaning: Q is the number of mixture components, μq indicates the mean frequency of component q, the inverse of the variance 1/σq can be interpreted as the length-scale of each component, reflecting how quickly that frequency contribution changes with the input x, and the weights wq determine the relative contribution of each component [15].

Model training

The marginal likelihood of Gaussian process regression with Gaussian observation noise is available in closed form [37], but unfortunately this is not the case for the model marginal likelihood that integrates over the hyperparameters θ, which is needed to compute the Bayes factor. We therefore approximate these using the Bayesian Information Criterion (BIC) [39], given by as

logp(DMi)logp(yx,θ^,Mi)-l2logn, (19)

with x = (x1, …, xn)T and y = (y1, …, yn)T, l the number of hyperparameters, and θ^=arg maxθp(yx,θ,M) the optimized hyperparameters, for i ∈ {0, 1}.

BNDD is implemented in Python using GPflow 2.2 [40]. We set the prior function to the empirical mean. The BMA distribution is approximated via Monte Carlo, and visualized with kernel density estimation. Code and data are available at Github.

Training the spectral mixture kernel

The number of mixture components Q in the spectral mixture kernel of our ITS approach is optimized in the same way as other covariance function parameters are optimized, that is, by optimizing the GP marginal likelihood. The covariance function mixture parameters are initialized by fitting a Gaussian mixture model to the empirical spectral using the Lomb-Scargle periodogram, which is applicable for detecting spectral features in (potentially) unevenly sampled data [41].

Covariance functions as design choices

The choice of the Gaussian process covariance functions plays two conceptually distinct roles in BNDD. First, our choice of covariance function reflects our beliefs about the latent process that generated the observations. In traditional RD designs, one assumes a parametrized model such as (local) linear regression. In BNDD, this explicit parametric form is replaced by a GP prior that assigns a probability distribution to the space of functions. For instance, we may expect functions to be smooth in x, or assume functions are a superposition of sine waves [15, 37]. BNDD can replicate parametrized models by selecting degenerate covariance functions, such as a linear covariance function.

These modeling choices are crucial in RD design as model misspecification can lead to incorrect inference. When we do not have clear prior beliefs about a covariance function, we may compute the Bayesian model average [12] across a set of candidate kernels K:

BF10total=p(DM1)p(DM0)=kKp(Dk)p(kM1)kKp(Dk)p(kM0). (20)

Here, the quantity BF10total serves as a final decision metric to determine an effect in a quasi-experimental design, while a detailed report is provided by inspecting the Bayes factors corresponding to each considered covariance function. Similarly, we can compute a marginal effect size across all considered kernels. In practice, the evidence of one covariance function can dominate all others, in which case the BMA procedure converges to performing the analysis with the best covariance function only.

The second role of the covariance function choice is that it determines to which types of discontinuities BNDD is sensitive. Importantly, different covariance functions can be used to test fundamentally different hypotheses, as they determine which features of the latent function are part of the alleged effect. For example, the simplest (degenerate) covariance function, the constant function, is sensitive only to differences in the means of the two groups (resulting essentially in a quasi-experimental Bayesian t-test), while the linear covariance function is sensitive to both the difference in mean as well as in slope. In the non-degenerate case, the Matérn covariance function with parameter ν = p + 1/2 can detect discontinuities in up to the p-th derivative. It has two interesting special cases: one is the exponential covariance function (Matérn with p = 1/2), which detects only discontinuities in the function itself (and not in its derivatives). This is the nonparametric counterpart of traditional linear regression discontinuity. On the other end is the exponentiated-quadratic covariance function which (Matérn kernel with ν = ∞). This allows us to detect discontinuities of any order, although the amount of data required to detect such subtle effects may become prohibitively large.

Simulations

We evaluate the performance of BNDD in regression discontinuity settings in simulations, using polynomials up to the fifth order, which have been used in other RD design studies as well [30]. We evaluate the performance of BNDD using the linear, exponential, Matérn (ν = 3/2) and exponentiated-quadratic covariance functions and compare its results with two baselines. The first is the Python RDD package, which uses linear regression together with the Imbens-Kalyanaraman bandwidth selection method [30] to select only a subset of the data around x0 to perform the analysis on. The second comparison is with another GP-based approach [26], which first estimates the conditional effect size distribution p(dM1,D) and then tests the null hypothesis d = 0. We refer to this approach as the 2-stage GP as it combines the GP regression from M1 with a frequentist test.

The true data generating functions used are provided in [30], and are complemented by a simple linear function to see the behaviour when the linearity assumption by the baseline is actually correct. The function definitions are given by

fLinear(x)=0.23+0.89xfQuad(x)={3x2ifx<x0,4x2otherwise.fCubic(x)={3x3ifx<x0,4x3otherwise.fLee(x)={0.48+1.27x+7.18x2+20.21x3+21.54x4+7.33x5ifx<x0,0.48+0.84x-3.0x2+7.99x3-9.01x4+3.56x5otherwise.fCATE1(x)=0.42+0.84x-3.0x2+7.99x3-9.01x4+3.56x5fCATE2(x)=0.42+0.84x+7.99x3-9.01x4+3.56x5fLudwig(x)={3.71+2.3x+3.28x2+1.45x3+0.23x4+0.03x5ifx<x0,3.71+18.49x-54.81x2+74.3x3-45.02x4+9.83x5otherwise.fCurvature(x)={0.48+1.27x-3.44x2+14.147x3+23.694x4+10.995x5ifx<x0,0.48+0.84x-0.3x2-2.397x3-0,901x4+3.56x5otherwise.

For each latent function f, we generate 100 data sets with n = 100 observations each (xi, yi) according to the following procedure:

xiU(-1,1)yixi,σ,d,f~Nf(xi)+d[xix0],σ2,

where the threshold x0 = 0. We fix σ = 1.0 and vary d ∈ {0, 0.5, …, 4.0}, effectively providing a range of different signal-to-noise regimes.

Next, we subject the simulated data to analysis by BNDD, using a first-order polynomial, an exponential, a Matérn (ν = 3/2) and a exponentiated-quadratic covariance function, as well as the Bayesian model average of this set. Fig 1 shows an example run of BNDD on the functions considered in our simulation. Here, the different functions are shown together with the regressions by both the continuous and discontinuous models, for each of the four considered covariance functions. The vertical bars in the figure show the expectation of the estimated effect size p(dD,M1).

Fig 1. Regression discontinuity example.

Fig 1

One simulation run for effect sizes d ∈ {0.25, 1.0, 4.0} and σ = 1.0. The covariance functions used here are linear, exponential, Matérn (ν = 3/2) and exponentiated-quadratic. The vertical bars indicate the estimated effect sizes by the discontinuous models for the different covariance functions. As the figure shows, the linear covariance function tends to have the strongest bias, in particular in the low signal-to-noise regime.

For each covariance function, we compute the Bayes factor for the presence of a discontinuity, and we estimate both the conditional and marginal effect size (8). For the RDD baseline with bandwidth optimization, and the 2-stage GP approach [26], we compute both the estimated effect size as well as the p-value for the test of a present effect. The performance of the different approaches is quantified by the absolute error between the estimated and true effect size. In addition, we show the decision metric for each method.

Fig 2 shows the absolute difference between the true effect size and the posterior expectations, as well as these decision metrics. The discontinuous model overestimates d when the true effect size is small, as is to be expected from the implicit conditioning on an effect. The BMA does not have this bias, resulting in lower errors for small and absent effects. For medium effects, this itself can result in a bias due to shrinkage (e.g. the Cubic function), while for large effects the BMA converges to M1 and the bias disappears. Generally, BNDD performs on par with the optimized-bandwidth baseline, with worse performance for the Ludwig function, and better for e.g. Curvature, as well as for most cases with an absent or small effect.

Fig 2. Simulation results.

Fig 2

Top row: the error between the true and estimated effect size. The dashed line indicates the 2-stage GP approach (see text), which is equivalent to M1. Middle row: The log Bayes factor. Final row: The p-values obtained by the RDD baseline (black) and the 2-stage approach. The horizontal dashed lines indicates the common thresholds of |BF|<3 and p = 0.05. Error bars indicate standard errors over simulations runs.

The decision metrics show that for small or absent effects, BNDD can report evidence in favor of the null, while the corresponding p-values are inconclusive. The methods positively identify effects at roughly the same true effect sizes. An interesting special case is observed for the Lee and Ludwig functions, which both feature a discontinuity in their derivative [26], which is correctly picked up by BNDD even when the magnitude of the effect is small, confirming the ability to detect discontinuities of higher orders.

Simulated ITS

We explore the ITS application of BNDD in another simulation. Here, we generate oscillating data where for xx0 a frequency shift is introduced. The latent function for the ITS simulation is given by

f(x)={sin(12x)+23cos(25x)forx<x0andsin((12+α)x)+23cos((25+α)x)forxx0,

with x0 = 0, and where α indicates the shift in frequency (set to α = 4 in the example figure). We vary α across the range [0, …, 8] Hz. For observation noise, we once more assume

yN(f(x),σ2),

and σ2 = 0.2. For each value of α, we generate 20 datasets containing n = 200 evenly spaced observations.

We compare our extrapolations based on the spectral mixture covariance function with an ARMA model, which is commonly used in ITS designs [42, 43]. The parameters of the ARMA model are determined using a grid search and its BIC score. We then compare the root-mean-squared-error between samples from the predictive distribution obtained by BNDD and the true post-intervention signal, and similarly evaluate the performance of the ARMA extrapolations and the true signal. An example simulation run and BNDD application is shown in Fig 3A, with a post-intervention frequency shift of α = 4Hz. The model correctly recovers the true power spectrum, as well as the decreased amplitude of the second harmonic component post-intervention, and finds barely worth mentioning evidence in favor of an effect (logBF = 0.15). The estimated spectral mixture of the continuous model is centered between the true frequencies of the control and intervention group (Fig 3B). This faithfully represents the null hypothesis that the observations can be explained without any changes in spectral content. As the discontinuity grows larger, the standard deviation of the components of the continuous model increases as well, since it has to account for a larger difference. M1 instead correctly identifies the true mixture components. Fig 3C shows the RMSE of samples from the posterior distributions of f1 and the true function, as well as the ARMA estimate. BNDD consistently outperforms the baseline.

Fig 3. ITS simulation.

Fig 3

A. ITS application. Model fit and extrapolation of M0 and M1. The data were generated with a post-intervention frequency shift of α = 4. We find logBF = 0.15. The shaded interval represents two standard deviations around the mean. B. Estimated power spectra. The colors of the power density spectrum correspond to the legend of the regression. C. The RMSE between the estimated and true dITS using posterior samples of BNDD and an ARMA baseline (dashed line).

Applications

The effects of winning an election and longevity

A recent study [44] investigated the effect of running for US gubernatorial office on longevity. The authors use a regression discontinuity design, and conclude that politicians winning a close election live 5 to 10 years longer than if they had lost. These findings have been heavily criticized [45], and it is unclear whether a regression discontinuity analysis is actually appropriate here, as there is no clear intervention at x = 0 (where x is the percentile difference in election result). Despite these concerns, we analyze this data set here as it allows us to demonstrate some of the functionality specific to BNDD. The data are available from the original publication [44], and subsequently preprocessed following [45].

Using the linear RDD baseline we find an optimal bandwidth of 5.48 percentile points using the Imbens-Kalyanaraman procedure [30]. When using this bandwidth and testing for an effect, we find p = 0.019 and an estimated effect size of 9.4 years. With BNDD, using either a linear, exponential, or Matérn (ν = 3/2) covariance function, we find a more parsimonious explanation of these data by a constant function and a substantial noise term σn2, as shown by log Bayes factors of -0.12, 0.0, and 0.0, respectively, as shown in Fig 4. This indicates that from these data, no clear conclusion can be drawn, and that such a scenario is clearly identified using BNDD.

Fig 4. Election result effects on longevity.

Fig 4

Discontinuity analysis of the effect of close gubernatorial elections on longevity [44]. Shown are regressions by BNDD using a Matérn covariance function, and a linear RD baseline with an optimized bandwidth of 5.48 percentile points (shaded area). For BNDD, the regressions for M0 and M1 are nearly identical. For the baseline, the bandwidth optimization leads to a poor linear fit, and hence a spurious detection of an effect.

Phantom border effect on Dutch government elections

In 2017, the Dutch general elections were held. According to Dutch electorate geographer De Voogd, the share of votes that go to populist parties (We refrain from an extensive discussion of the definition of populism and refer to populist parties as those parties that emphasize ‘an alleged chasm between the elite and the general population’. In the Dutch 2017 elections, parties that fit this description were PVV, SP, 50Plus and FvD [46].) is different north and south of a so-called ‘phantom border’, a line that historically divided the catholic south of the Netherlands from the protestant north [47, 48]. This border serves as a two-dimensional threshold along which one can apply RD design. This special case of RD design where the assignment threshold is a geographical boundary is also referred to as GeoRDD [10]. Here, we test the hypothesis by De Voogd.

First, the vote distribution per Dutch municipality were collected from the Dutch government website [49]. We then manually constructed an approximation of the phantom border (see the dashed lines in Fig 5) and used this as a function to divide the available municipalities in either above or below the border. For visualization of country and municipality borders, data from the Dutch national georegister was used [50]. Next, we applied BNDD using the linear and first-order Matérn covariance functions. The results of the analysis are shown in Fig 5. The figure shows the Netherlands with the fraction of populist votes per municipality superimposed, together with the phantom border representing the supposed divide in voting behaviour.

Fig 5. Phantom-border effects on populist voting.

Fig 5

Discontinuity analysis along a two-dimensional boundary (indicated by the dashed line). A. Circles indicate the observed fraction of populist votes; municipalities are shaded according to the Gaussian process predictions. B. The distribution of effect size conditioned on M1, p(dD,M1), along the phantom border. The shaded interval indicates one standard deviation around the mean. The country and municipality border data are available at the website of the Dutch national georegister [50], and the superimposed populist voting fractions were derived from the 2017 election results at https://data.overheid.nl [49].

If we assume a linear underlying process, there is strong evidence for a discontinuity (logBF = 24.4), confirming the hypothesis by De Voogd. Visually however, the data do not appear to follow these linear trends. The nonparametric Matérn covariance function results in evidence against an effect (logBF = −3.5). As the Matérn covariance function fits the data much more accurately than the linear covariance function, the Bayesian model average is completely dominated by the former, leading to the conclusion that the historical phantom border does not create a geographic discontinuity in populist voting behaviour.

Kundalini meditation effect on heart rate

Earlier work [51] studied the hypothesis that Kundalini Yoga meditation techniques reduce one’s heart rate. However, they find the opposite; the meditation instead brings about an increase in heart rate. The experiment lends itself well for ITS design, but in practice may be difficult to perform because the data are not evenly sampled. However, this is not a prerequisite for Gaussian process regression, which together with the spectral mixture kernel [15] is well-suited to model these data. The observations are obtained from the PhysioNet database and consists of heart rates of two women and two men, of ages 20–52 (mean 33) [52]. We focus on one participant due to space constraints. Since we do not merely want to detect a change in absolute heart rate, but in its fluctuations, use a changepoint mean function [53] for M0 and two separate constant mean functions for M1 to capture the different means. Fig 6 shows the corresponding regression and extrapolation. The continuous model requires more spectral mixture components; Q = 6 for f0 compared to Q = 2 for f(x;θ1A) and Q = 3 for f(x;θ1B). The analysis finds overwhelming evidence for an effect (logBF = 281.2).

Fig 6. Kundalini meditation effect on heart rate.

Fig 6

Analysis of meditation effect on heart rate. Shown is the participant’s heart rate, who starts meditation at x0 = 00: 00. The extrapolation, indicated by the dashed (mean) and dotted (posterior samples) red lines, is poor in comparison to the actual observations, which is corroborated by the large log Bayes factor. The panel on the right shows the (log) power spectra expressed by the optimized covariance function hyperparameters.

Discussion

In order to infer causality from QED, one assumes that the alleged change occurs at the threshold, but that the latent process is otherwise stationary. Consequently, the behaviour of the two groups changes sharply around the intervention. In standard RD studies, this locality is controlled via a bandwidth parameter that determines the sensitivity of the detection approach [20]. This requires the availability of sufficient data around the threshold, and the analysis is sensitive to this parameter. In BNDD with stationary nonparametric covariance functions, the bandwidth is replaced by a length-scale hyperparameter, which we optimize using the model marginal likelihood. The length-scale regulates how fast the correlations between consecutive points decay with their distance, and thus how sensitive BNDD is around the threshold [54]. This implements a trade-off between estimation reliability and the locality assumption that is needed to draw causal inferences [16]. The benefit of a length-scale instead of a fixed bandwidth parameter is that the relative influence of observations decreases gradually as they are further away from the intervention point, and that this distance is automatically adjusted.

With an exponential covariance function the most rigorous form of locality can be enforced. Here, the Markov properties of the Gaussian process guarantee that only discontinuities at the intervention threshold are detectable. On the other hand, non-local covariance functions such as the periodic covariance function are vulnerable to false positives if the true process is non-stationary. Here, the presence of change points away from the intervention threshold can lead to false alarms, due to the flexibility of the regressions. In this case, or in exploratory applications, BNDD can be performed in a sliding-window fashion to ensure that the highest Bayes factor is at the intervention threshold.

The Bayesian model averaging procedure that we use in BNDD depends on the model probabilities p(M0) and p(M1). Here, we have assumed a uniform prior on these model probabilities, as we have no reason to prefer either the continuous null model or the discontinuous alternative. However, it should be noted that prior beliefs may be incorporated to reflect our initial assumptions on the probability of an effect, as well as to adjust for multiplicity in case many hypotheses are tested simultaneously [36] (for instance, in [55] a regression discontinuity design is used to test the causal influence between neuronal populations).

BNDD can be extended in several ways. For instance, we do not currently account for covariates that may serve as confounds for causal inference [19, 35]. However, such covariates can be explicitly taken into account in the regression models, or even be learned from the observations [56]. Covariate selection can be performed using automatic relevance determination [57], where we learn separate length-scales for each covariate. Furthermore, improvement is expected from more accurate estimators of the model marginal likelihood than the BIC, such as the ELBO or bridge sampling [58]. Throughout this paper, we have assumed a Gaussian likelihood. This conveniently leads to an analytic solution of the GP posterior, because the GP prior is conjugate to this likelihood. However, using variational inference or the Laplace approximation, BNDD can be used in combination with non-Gaussian observation models [37]. For instance, one could use a Poisson likelihood to model observed count data [59], or a Bernoulli likelihood for binary observations [60]. Furthermore, other nonparametric priors over the latent functions may be used, such as the Student t- process [61]. Other extensions include modelling nondeterministic application of the threshold assignment, delayed response functions, multi-dimensional response variables [62]. BNDD extends naturally to the setting of multiple assignment variables [18, 6366].

Conclusion

In all, BNDD serves as a Bayesian nonparametric approach for causal inference in quasi-experimental designs. By selecting the appropriate covariance function, one has precise control over the type of discontinuity that can be detected, as well as a priori assumptions of the latent data generating processes. Importantly, Bayesian model averaging allows us to marginalize over key assumptions, such as the choice of covariance function, or the presence/absence of an effect. The resulting method is a nuanced framework for discontinuity-based research designs.

Data Availability

All data and analyses are available on Github: https://github.com/mhinne/BNQD.

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1. Hill AB. The clinical trial. N Engl J Med. 1952;247:113–119. doi: 10.1056/NEJM195207242470401 [DOI] [PubMed] [Google Scholar]
  • 2. Pearl J. Causal inference in statistics: An overview. Statistics Surveys. 2009;3:96–146. doi: 10.1214/09-SS057 [DOI] [Google Scholar]
  • 3. Imbens GW, Rubin DB. Causal Inference in Statistics, Social, and Biomedical Sciences. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press; 2015. [Google Scholar]
  • 4. Shadish W, Cook T, Campbell D. Experimental and quasi-experimental designs for generalized causal inference. 2nd ed. Cencage Learning, Inc.; 2002. [Google Scholar]
  • 5. Campbell DT, Stanley JC. Experimental and Quasi-Experimental Designs for Research. Rand McNally College Publishing; 1963. [Google Scholar]
  • 6. Lauritzen SL. Causal inference from graphical models. Complex stochastic systems. 2001; p. 63–107. [Google Scholar]
  • 7. McDowall D, McCleary R, Meidinger E, Hay R. Interrupted time series analysis. Thousand Oaks, CA, USA: Sage Publications Inc.; 1980. [Google Scholar]
  • 8. Lee DS, Lemieux T. Regression discontinuity designs in econometrics. Journal of Economic Literature. 2010;48(June):281–355. doi: 10.1257/jel.48.2.281 [DOI] [Google Scholar]
  • 9. Imbens GW, Lemieux T. Regression discontinuity designs: A guide to practice. Journal of Econometrics. 2008;142(2):615–635. doi: 10.1016/j.jeconom.2007.05.001 [DOI] [Google Scholar]
  • 10. Rischard M, Branson Z, Miratrix L, Bornn L. Do School Districts Affect NYC House Prices? Identifying Border Differences Using a Bayesian Nonparametric Approach to Geographic Regression Discontinuity Designs. Journal of the American Statistical Association. 2021;116(534):619–631. doi: 10.1080/01621459.2020.1817749 [DOI] [Google Scholar]
  • 11. Wagenmakers EJ. A practical solution to the pervasive problems of p-values. Psychonomic Bulletin & Review. 2007;14(5):779–804. doi: 10.3758/BF03194105 [DOI] [PubMed] [Google Scholar]
  • 12. Hoeting JA, Madigan D, Raftery AE, Volinsky CT. Bayesian model averaging: A tutorial. Statistical Science. 1999;14(4):382–401. [Google Scholar]
  • 13. Hinne M, Gronau QF, van den Bergh D, Wagenmakers EJ. A Conceptual Introduction to Bayesian Model Averaging. Advances in Methods and Practices in Psychological Science. 2020;3(2):200–215. doi: 10.1177/2515245919898657 [DOI] [Google Scholar]
  • 14. MacKay DJC. Information Theory, Inference & Learning Algorithms. USA: Cambridge University Press; 2002. [Google Scholar]
  • 15.Wilson A, Adams R. Gaussian process kernels for pattern discovery and extrapolation. In: International conference on machine learning; 2013. p. 1067–1075.
  • 16. Geneletti S, O’Keeffe AG, Sharples LD, Richardson S, Baio G. Bayesian regression discontinuity designs: Incorporating clinical knowledge in the causal analysis of primary care data. Statistics in Medicine. 2015;34:2334–2352. doi: 10.1002/sim.6486 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Thistlethwaite DL, T CD. Regression-discontinuity analysis: an alternative to the ex-post facto experiment. Journal of Educational Psychology. 1960;51:309–317. doi: 10.1037/h0044319 [DOI] [Google Scholar]
  • 18. Choi JY, Lee MJ. Regression discontinuity: review with extensions. Statistical Papers. 2017;58(4):1217–1246. doi: 10.1007/s00362-016-0745-z [DOI] [Google Scholar]
  • 19. Harris AD, Bradham DD, Baumgarten M, Zuckerman IH, Fink JC, Perencevich EN. The use and interpretation of quasi-experimental studies in infectious diseases. Antimicrobial resistance. 2004;38:1586–1591. [DOI] [PubMed] [Google Scholar]
  • 20. Bloom HS. Modern Regression Discontinuity Analysis. Journal of Research on Educational Effectiveness. 2012;5(1):43–82. doi: 10.1080/19345747.2011.578707 [DOI] [Google Scholar]
  • 21. Marinescu IE, Lawlor PN, Kording KP. Quasi-experimental causality in neuroscience and behavioural research. Nature Human Behaviour. 2018; p. 1–11. [DOI] [PubMed] [Google Scholar]
  • 22. Moscoe E, Bor J, Bärnighausen T. Regression discontinuity designs are underutilized in medicine, epidemiology, and public health: A review of current and best practice. Journal of Clinical Epidemiology. 2015;68(2):132–143. doi: 10.1016/j.jclinepi.2014.06.021 [DOI] [PubMed] [Google Scholar]
  • 23. Li T, Ungar L, Kording K. Quantifying causality in data science with quasi-experiments. Nature Computational Science. 2021;1:24–32. doi: 10.1038/s43588-020-00005-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Freni-Sterrantino A, Ghosh RE, Fecht D, Toledano MB, Elliott P, Hansell AL, et al. Bayesian spatial modelling for quasi-experimental designs: An interrupted time series study of the opening of municipal waste incinerators in relation to infant mortality and sex ratio. Environment International. 2019;128(November 2018):109–115. doi: 10.1016/j.envint.2019.04.009 [DOI] [PubMed] [Google Scholar]
  • 25. Hahn BYJ, Todd P, van der Klaauw W. Identification and Estimation of Treatment Effects with a Regression-Discontinuity Design. Econometrica. 2001;69(1):201–209. doi: 10.1111/1468-0262.00183 [DOI] [Google Scholar]
  • 26. Branson Z, Rischard M, Bornn L, Miratrix LW. A Nonparametric Bayesian Methodology for Regression Discontinuity Designs. Journal of Statistical Planning and Inference. 2019;202:14–30. doi: 10.1016/j.jspi.2019.01.003 [DOI] [Google Scholar]
  • 27. Bernal JL, Cummins S, Gasparrini A. Interrupted time series regression for the evaluation of public health interventions: A tutorial. International Journal of Epidemiology. 2017;46(1):348–355. doi: 10.1093/ije/dyw098 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology. 1974;66(5):688–701. doi: 10.1037/h0037350 [DOI] [Google Scholar]
  • 29. O’Keeffe AG, Baio G. Approaches to the Estimation of the Local Average Treatment Effect in a Regression Discontinuity Design. Scandinavian Journal of Statistics. 2016;43(4):978–995. doi: 10.1111/sjos.12224 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Imbens G, Kalyanaraman K. Optimal bandwidth choice for the regression discontinuity estimator. Review of Economic Studies. 2012;79(3):933–959. doi: 10.1093/restud/rdr043 [DOI] [Google Scholar]
  • 31. Kim Y, Steiner P. Quasi-experimental designs for causal inference. Educational Psychologist. 2016;51(3-4):395–405. doi: 10.1080/00461520.2016.1207177 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Kass RE, Raftery AE. Bayes Factors. Journal of the American Statistical Association. 1995;90(430):773–795. doi: 10.1080/01621459.1995.10476572 [DOI] [Google Scholar]
  • 33. Jarosz AF, Wiley J. What Are the Odds? A Practical Guide to Computing and Reporting Bayes Factors. Journal of Problem Solving. 2014;7:2–9. doi: 10.7771/1932-6246.1167 [DOI] [Google Scholar]
  • 34. Goodman SN. Toward Evidence-Based Medical Statistics. 1: The p-Value fallacy. Annals of Internal Medicine. 1999;130(12):995–1004. doi: 10.7326/0003-4819-130-12-199906150-00008 [DOI] [PubMed] [Google Scholar]
  • 35. Brodersen KH, Gallusser F, Koehler J, Remy N, Scott SL. Inferring causal impact using Bayesian structural time-series models. Annals of Applied Statistics. 2015;9(1):247–274. doi: 10.1214/14-AOAS788 [DOI] [Google Scholar]
  • 36. Guo M, Heitjan DF. Multiplicity-calibrated Bayesian hypothesis tests. Biostatistics. 2010;11:473–483. doi: 10.1093/biostatistics/kxq012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Rasmussen CE, Williams CKI. Gaussian processes for machine learning. The MIT Press; 2005. [Google Scholar]
  • 38. Bochner S. Lectures on Fourier integrals. vol. 42. Princeton University Press; 1959. [Google Scholar]
  • 39. Schwarz G. Estimating the dimension of a model. The Annals of Statistics. 1978;6:461–464. doi: 10.1214/aos/1176344136 [DOI] [Google Scholar]
  • 40. Matthews AG, van der Wilk M, Nickson T, Fujii K, Boukouvalas A, León-Villagrá P, et al. GPflow: A Gaussian process library using TensorFlow. Journal of Machine Learning Research. 2017;18(40):1–6. [Google Scholar]
  • 41. Vanderplas JT. Understanding the Lomb–Scargle periodogram. The Astrophysical Journal Supplement Series. 2018;236(1):16. doi: 10.3847/1538-4365/aab766 [DOI] [Google Scholar]
  • 42. Prado R, West M. Time Series: Modeling, Computation, and Inference. 1st ed. Chapman &Hall/CRC; 2010. [Google Scholar]
  • 43. Jandoc R, Burden AM, Mamdani M, Linda EL, Cadarette SM. Interrupted time series analysis in drug utilization research is increasing: systematic review and recommendations. Journal of Clinical Epidemiology. 2015;68:950–956. doi: 10.1016/j.jclinepi.2014.12.018 [DOI] [PubMed] [Google Scholar]
  • 44. Barfort S, Klemmensen R, Larsen EG. Longevity returns to political office. Political Science Research and Methods. 2021;9:658–664. doi: 10.1017/psrm.2019.63 [DOI] [Google Scholar]
  • 45.Gelman A. No, I don’t believe that claim based on regression discontinuity analysis that…; 2020. https://statmodeling.stat.columbia.edu/2020/07/02/no-i-dont-believe-that-claim-based-on-regression-discontinuity-analysis-that/.
  • 46. Müller JW. What is populism? Philadelphia, PA, U.S.A.: University Of Pennsylvania Press; 2016. [Google Scholar]
  • 47.De Voogd J. Van Volendam tot Vinkeveen: de electorale geografie van de PVV; 2016.
  • 48.De Voogd J. Deze eeuwenoude grenzen kleuren de verkiezingen nog altijd; 2017.
  • 49.Kennis- en Exploitatiecentrum Officiële Overheidspublicaties (KOOP). Verkiezingsuitslag Tweede Kamer 2017; 2018. https://data.overheid.nl/dataset/verkiezingsuitslag-tweede-kamer-2017.
  • 50.Centraal Bureau voor Statistiek. CBS gebiedsindelingen; 2022. https://www.cbs.nl/nl-nl/dossier/nederland-regionaal/geografische-data/cbs-gebiedsindelingen.
  • 51. Peng C, Mietus JE, Liu Y, Khalsa G, Douglas PS, Benson H, et al. Exaggerated heart rate oscillations during two meditation techniques. International Journal of Cardiology. 1999;70:101–107. doi: 10.1016/S0167-5273(99)00066-2 [DOI] [PubMed] [Google Scholar]
  • 52. Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov PC, Mark R, et al. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 2000;101(23):e215–e220. [DOI] [PubMed] [Google Scholar]
  • 53.Saatçi Y, Turner R, Rasmussen CE. Gaussian Process Change Point Models. In: Proceedings of the 27th International Conference on International Conference on Machine Learning. ICML’10. Madison, WI, USA: Omnipress; 2010. p. 927–934.
  • 54.Duvenaud D. Automatic model construction with Gaussian processes; 2014.
  • 55. Lansdell BJ, Kording KP. Neural spiking for causal inference. bioRxiv. 2019. doi: 10.1101/253351 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Kocaoglu M, Shanmugam K, Bareinboim E. Experimental Design for Learning Causal Graphs with Latent Variables. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, et al., editors. Advances in Neural Information Processing Systems (NeurIPS) 30. Curran Associates, Inc.; 2017. p. 7018–7028. [Google Scholar]
  • 57. Wipf DP, Nagarajan SS. A New View of Automatic Relevance Determination. In: Platt JC, Koller D, Singer Y, Roweis ST, editors. Advances in Neural Information Processing Systems 20. vol. 20. Curran Associates, Inc.; 2008. p. 1625–1632. [Google Scholar]
  • 58. Fourment M, Magee AF, Whidden C, Bilge A, Matsen FA, Minin VN. 19 Dubious Ways to Compute the Marginal Likelihood of a Phylogenetic Tree Topology. Systematic Biology. 2020;69(2):209–220. doi: 10.1093/sysbio/syz046 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Adams RP, Murray I, MacKay DJC. Tractable Nonparametric Bayesian Inference in Poisson Processes with Gaussian Process Intensities. In: Bottou L, Littman M, editors. Proceedings of the 26th International Conference on Machine Learning (ICML). Montreal: Omnipress; 2009. p. 9–16.
  • 60. Williams CKI, Barber D. Bayesian classification with Gaussian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1998;20(12):1342–1351. doi: 10.1109/34.735807 [DOI] [Google Scholar]
  • 61.Shah A, Wilson AG, Ghahramani Z. Student-t processes as alternatives to Gaussian processes. In: Kaski S, Corander J, editors. Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics. vol. 33 of Proceedings of Machine Learning Research. Reykjavik, Iceland: PMLR; 2014. p. 877–885.
  • 62.Osborne MA, Roberts SJ, Rogers A, Ramchurn SD, Jennings NR. Towards real-time information processing of sensor network data using computationally efficient multi-output Gaussian processes. In: 2008 International Conference on Information Processing in Sensor Networks (ipsn 2008); 2008. p. 109–120.
  • 63. Reardon SF, Robinson JP. Regression Discontinuity Designs With Multiple Rating-Score Variables. Journal of Research on Educational Effectiveness. 2012;5(1):83–104. doi: 10.1080/19345747.2011.609583 [DOI] [Google Scholar]
  • 64. Papay JP, Willett JB, Murnane RJ. Extending the regression-discontinuity approach to multiple assignment variables. Journal of Econometrics. 2011;161(2):203–207. doi: 10.1016/j.jeconom.2010.12.008 [DOI] [Google Scholar]
  • 65. Wong VC, Steiner PM, Cook TD. Analyzing Regression-Discontinuity Designs With Multiple Assignment Variables: A Comparative Study of Four Estimation Methods. Journal of Educational and Behavioral Statistics. 2013;38(2):107–141. doi: 10.3102/1076998611432172 [DOI] [Google Scholar]
  • 66. Choi Jy, Lee Mj. Regression Discontinuity with Multiple Running Variables Allowing Partial Effects. Political Analysis. 2018;26(3):258–274. doi: 10.1017/pan.2018.13 [DOI] [Google Scholar]

Decision Letter 0

Lalit Chandra Saikia

27 Apr 2022

PONE-D-22-03933Bayesian model averaging for nonparametric discontinuity designPLOS ONE

Dear Dr. Hinne,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

The authors must address all the issues raised by the reviewer. The paper is recommended for major revision.

Please submit your revised manuscript by Jun 11 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Lalit Chandra Saikia, PhD

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We noticed you have some minor occurrence of overlapping text with the following previous publication(s), which needs to be addressed:

- https://arxiv.org/abs/1911.06722

In your revision ensure you cite all your sources (including your own works), and quote or rephrase any duplicated text outside the methods section. Further consideration is dependent on these concerns being addressed.

3. We note that Figure 5 in your submission contain map images which may be copyrighted. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For these reasons, we cannot publish previously copyrighted maps or satellite images created using proprietary data, such as Google software (Google Maps, Street View, and Earth). For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright.

We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission:

 a. You may seek permission from the original copyright holder of Figure 5 to publish the content specifically under the CC BY 4.0 license. 

We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text:

“I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.”

Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission.

In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].”

b. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only.

The following resources for replacing copyrighted map figures may be helpful:

 USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/

The Gateway to Astronaut Photography of Earth (public domain): http://eol.jsc.nasa.gov/sseop/clickmap/

Maps at the CIA (public domain): https://www.cia.gov/library/publications/the-world-factbook/index.html and https://www.cia.gov/library/publications/cia-maps-publications/index.html

NASA Earth Observatory (public domain): http://earthobservatory.nasa.gov/

Landsat: http://landsat.visibleearth.nasa.gov/

USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain): http://eros.usgs.gov/#

Natural Earth (public domain): http://www.naturalearthdata.com/

Additional Editor Comments:

The authors must address all the issues raised by the reviewer. The paper is recommended for major revision.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The paper presents a framework for discontinuity-based designs using Bayesian model averaging and Gaussian process regression. It is a topic of interest to the researchers in the related areas. For the reader, however, a number of points need clarifying and certain statements require further justification. My detailed comments are as follows:

(1)In the abstract section, the quantification proposed by the authors supporting the innovation point of either model is crucial to solve the main problem of the average of Bayesian models, because the prior probabilities of different models greatly affect the results of the Bayesian model averaging method, and the authors please describe them in detail in this section.

(2)Is this article more applicable to the discontinuity model?The study of continuous models is not comprehensive and detailed, does this paper focus on the description of discontinuous models?

(3)It is noteworthy that your paper requires careful editing of the format.There are many problems in the paper format, such as the first line of the paragraph, multiple syntax errors, etc.

Reviewer #2: This paper is quite interesting paper working on nonparametric discontinuity design with Bayesian model averaging. Previous approaches have overconfidence from the implicit consumptions and model misspecification from the overly simplified regression models. To overcome these shortcomings, this paper proposes a framework of discontinuity-based designs using Bayesian model averaging and Gaussian process regression, namely, Bayesian nonparametric discontinuity design' (BNDD).

This paper demonstrated the usage of BNDD in multiple promising simulations, such as the effects of winning an election and longevity, phantom border effect on Dutch government elections, and Kundalini meditation effect on heart rate.

The paper is also well written and structured, and it is easy to follow.

One minor doubt for me is the paper assume the distribution is Gaussian distribution, and thus Gaussian process is used in the paper. Is possible real world simulation is not under Gaussian distribution, and thus can not be used in Gaussian process?

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Decision Letter 1

Lalit Chandra Saikia

8 Jun 2022

Bayesian model averaging for nonparametric discontinuity design

PONE-D-22-03933R1

Dear Dr. Hinne,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Lalit Chandra Saikia, PhD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This paper can be accepted, the research in this paper is interesting, and the authors have done enough work.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

**********

Acceptance letter

Lalit Chandra Saikia

14 Jun 2022

PONE-D-22-03933R1

Bayesian model averaging for nonparametric discontinuity design

Dear Dr. Hinne:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Lalit Chandra Saikia

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: Response to Reviewers.pdf

    Data Availability Statement

    All data and analyses are available on Github: https://github.com/mhinne/BNQD.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES