Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2020 Dec 29;49(6):1421–1448. doi: 10.1080/02664763.2020.1864814

Bayesian hierarchical models for linear networks

Zainab Al-kaabawi 1, Yinghui Wei 1,CONTACT, Rana Moyeed 1
PMCID: PMC9042008  PMID: 35707112

Abstract

The purpose of this study is to highlight dangerous motorways via estimating the intensity of accidents and study its pattern across the UK motorway network. Two methods have been developed to achieve this aim. First, the motorway-specific intensity is estimated by using a homogeneous Poisson process. The heterogeneity across motorways is incorporated using two-level hierarchical models. The data structure is multilevel since each motorway consists of junctions that are joined by grouped segments. In the second method, the segment-specific intensity is estimated. The homogeneous Poisson process is used to model accident data within grouped segments but heterogeneity across grouped segments is incorporated using three-level hierarchical models. A Bayesian method via Markov Chain Monte Carlo is used to estimate the unknown parameters in the models and the sensitivity to the choice of priors is assessed. The performance of the proposed models is evaluated by a simulation study and an application to traffic accidents in 2016 on the UK motorway network. The deviance information criterion (DIC) and the widely applicable information criterion (WAIC) are employed to choose between models.

Keywords: Hierarchical models, Bayesian methods, linear networks, point processes

1. Introduction

Traffic crashes have considerable impact on human, economics and the society. To improve road safety, traffic accidents research often seeks to determine prediction methods of traffic accidents. Traditional crash prediction models, such as generalized linear model, are widely used in traffic safety studies. However, multilevel data structure is extensively existed due to technique used to collect or cluster traffic data [21]. Ignoring hierarchical nature of data may produce unreliable estimates of model parameters and statistical inference. Hierarchical modelling is a statistical approach that used to properly take account of multilevel data structure [16,21]. Currently, hierarchical modelling has been employed in many research fields such as sociology, education, political science and public health. Shankar et al. [41] showed that the explanatory power of crash models had been improved when site-specific random effects and time indicator were incorporated into the negative binomial regression model. Jones and Jørgensen [25] expounded and discussed possible applications of hierarchical models in road traffic accidents in Norway. The use of hierarchical modelling technique to represent multilevel data structure in crash prediction has been growing since then. In some research, hierarchical models were used to predict crash frequency [12,18,24,26,31,34,36,39] and in other research, hierarchical models were developed to identify factors affecting crash severity [23,25,29]. In previous studies, models were proposed to account for unobserved heterogeneity at the segment level, for example, hierarchical Bayesian binary probit models [40], binary logit models [15], Bayesian multilevel Poisson-lognormal joint models [2], multivariate hierarchical Poisson-lognormal spatial joint models [1], and grouped latent class ordered probit models with class-probability functions[13]. In general, these models accommodated the unobserved heterogeneity arising from the random effects terms, the possibility of systematic variations of unobserved groups across the highway segments or corridors which consist of intersections and roadway segments. In our study, two-level and three-level Bayesian hierarchical models are proposed to capture the unobserved heterogeneity by varying the accident intensity across the motorways and motorway segments.

The motorway network is considered as a linear network and road accidents as a spatial point pattern involving the spatial locations of accidents. A linear network L is defined as the union L=i=1ni of a finite collection of line segments 1,,n in the plane [4]. The line segment in the plane with endpoints u and v is given by [u,v]={tu+(1t)v:0t1}. For a point process on a linear network, an intensity of points along the network is defined as the expected number of points per unit length of network [5].

The novelty of this work is to estimate the intensity function of accidents and study its pattern across the UK motorway network using a Bayesian approach. We proposed hierarchical models to estimate the motorway-specific and segment-specific accidents intensities. In a two-level model, the intensity of accidents was considered homogeneous across segments within each motorway, but was inhomogeneous across motorways. Each motorway consists of junctions that are joined by grouped segments. The intensity of accidents may be inhomogeneous across grouped segments. Thus, ignoring the between grouped segments heterogeneity may underestimate the standard error of the accident intensity. In a three-level model, the intensity of accidents was considered homogeneous within grouped segments, but was inhomogeneous among grouped segments and motorways.

2. Description of data

The accident data are obtained from the Department for Transport in Great Britain [43]. A STATS19 reporting form was used to record accident on the road network by police officers. This form gives details about accident circumstances such as road and weather conditions, time and location of accident, the driver's behaviour and the vehicles involved in a road accident. The traffic accident is registered as the sample point for each link of road in the road network. The road network consists of individual road sections that are segmented in junction to junction link, with different lengths. For this study, the unit of measurement for length is a kilometre. Segments are grouped according to count point locations existing on Great Britain's motorway network. In these count points, traffic estimates are calculated for each link of Great Britain's motorway network, with links' start and end points defined as where the link joins a motorway junction [44,45]. Each link has a uniquely referenced Count Point (CP), where the traffic is usually counted by enumerators [44].

There are 51 motorways in the UK and data are recorded on 49 motorways by STATS19. The hierarchical structure of data in this study is shown in Figure 1.

Figure 1.

Figure 1.

The hierarchical structure of UK motorway network in 2016. Mi denotes motorway i. M1GS1 represents the first grouped segments on M1; M1GS83 represents the eighty third grouped segments on M1 and so on for the rest of the symbols of grouped segments for other motorways. The grouped segments are links joining motorway junctions and they are grouped to incorporate the heterogeneity of the accident intensity across them [44]. (a) Two-level hierarchical model. (b) Three-level hierarchical model.

Table 1 shows the descriptive statistics for the accident data in 2016 for 49 motorways in the UK.

Table 1.

Descriptive statistics for accident data on the UK motorways in 2016.

Motorway Length of motorway (in kilometres) Number of accidents Number of grouped segments
M1 304.50 593 83
M2 39.39 53 14
M3 95.89 188 22
M4 299.84 475 80
M5 256.22 263 52
M6 366.76 637 90
M8 87.85 163 51
M9 52.60 25 18
M11 80.41 130 17
M18 45.25 48 13
M20 79.54 153 18
M23 26.59 34 7
M25 181.52 657 48
M26 15.46 15 2
M27 47.86 151 24
M32 6.87 17 6
M40 140.21 179 33
M42 64.93 88 21
M45 12.39 3 3
M48 19.30 6 5
M49 8.30 1 3
M50 33.19 5 8
M53 32.21 31 17
M54 35.90 12 13
M55 19.90 19 5
M56 57.89 72 30
M57 16.48 22 11
M58 18.95 7 10
M60 56.75 72 40
M61 36.50 50 12
M62 154.90 253 48
M65 41.88 71 19
M66 13.64 15 8
M67 7.53 4 4
M69 26.18 23 5
M73 11.23 10 8
M74 141.30 55 54
M77 28.93 17 22
M80 41.67 38 33
M90 49.32 23 16
M180 39.68 13 6
M181 3.78 2 1
M271 3.74 8 4
M275 3.98 12 5
M602 6.37 7 2
M606 4.03 14 3
M621 15.05 35 12
M876 11.89 11 8
M898 1.38 2 1

Note: The length of motorway is in kilometre.

3. Models

3.1. Two-level Bayesian hierarchical model (Model 1)

3.1.1. Model definition

Let m denote the total number of motorways. The number of accidents ni on motorway i(i=1,,m) follows a Poisson distribution with mean λiLi, where Li represents the length of motorway i and λi is the accident intensity on the motorway i per unit length. Here λiLi is the expected number of accidents on motorway i and it can vary from one motorway to another because each motorway could have different conditions and features. Let αi=logλi denote the log-intensity function and assume it follows a normal distribution N(α,τ2). The two-level hierarchical model for traffic accidents is given below,

niPois(λiLi),i=1,,m,αiN(α,τ2). (1)

Here α is the overall log-intensity and τ2 is the between-motorway variance. In this model, we assume that each accident's location follows a uniform distribution on interval (0,Li).

3.1.2. Likelihood function

Let N={ni,i=1,,m} represent the accident count and Θ={α1,α2,,αm;α;τ2} the parameters in the two-level hierarchical model. The likelihood for the two-level hierarchical model is given by,

L(N|Θ)i=1mexp(niαiLiexp(αi))×i=1m12πτ2exp((αiα)22τ2). (2)

3.1.3. Prior distribution

The specification of the prior distribution depends on available information about unknown parameters. The strategy for specifying prior distributions for the parameters in the hierarchical model includes conjugate, vague and weakly-informative priors. Where possible, conjugate priors are used to ensure that the conditional distributions are of closed forms to ease the simulation; otherwise, we specify non-conjugate priors. In either cases, we specify the values of parameters to have large variances for vague or non-informative priors, and make use of pre-existing data to construct informative priors. For α, a conjugate normal prior N(μ0,σ02) is assigned. We used a conjugate inverse gamma prior with a shape parameter α0 and rate parameter β0 for τ2. Alternatively, we specify a uniform prior unif(0,a), with a>0, for the between-motorway standard deviation τ [28]. Another choice of prior is a half-normal distribution HN(θ) for τ, where θ=πσ2 and σ>0 as detailed in [27].

3.1.4. Posterior distribution

Posterior if the prior distribution on τ2 is an inverse gamma distribution

The posterior distribution is the product of the likelihood and the prior distribution. Therefore, the joint posterior density function for the parameters given the observed data is

π(Θ|N)=i=1mexp(niαiLiexp(αi))×i=1m12πτ2exp((αiα)22τ2)×12πσ02exp((αμ0)22σ02)×β0α0Γ(α0)(τ2)α01exp(β0/τ2). (3)

The conditional posterior distribution of αi is given by,

π(αi|α,τ2,N)exp(niαiLiexp(αi))×exp((αiα)22τ2). (4)

The conditional posterior distribution of α is a normal distribution N(μα,σα2) with mean and variance respectively given below,

μα=i=1mαiτ2+μ0σ02mτ2+1σ02andσα2=1mτ2+1σ02. (5)

The conditional posterior distribution of τ2 is given by,

π(τ2|α;α1,αm;N)(τ2)(α0+m2)1exp(β0+i=1m(αiα)22τ2). (6)

Hence, τ2 has an inverse gamma distribution with shape α0+m2 and rate β0+i=1m(αiα)22.

Posterior if the prior distribution on τ is a uniform distribution

The joint posterior density is

π(Θ|N)i=1mexp(niαiLiexp(αi))×i=1m12πτ2exp((αiα)22τ2)×12πσ02exp((αμ0)22σ02). (7)

The conditional posterior distributions of αi(i=1,,m)  and α are given in (4) and (5). The conditional posterior density of τ is given by,

π(τ|α;α1,,αm;N)(12πτ2)mexp(i=1m(αiα)22τ2). (8)

Posterior if the prior distribution on τ is a half-normal distribution (HN) with parameter θ

The joint posterior distribution is

π(Θ|N)=i=1mexp(niαiLiexp(αi))×i=1m12πτ2exp((αiα)22τ2)×12πσ02exp((αμ0)22σ02)×2θπexp(τ2θ2π). (9)

The conditional posterior distributions of αi(i=1,,m) and α are the same as in Equations (4) and (5). The conditional posterior distribution of τ is

π(τ|α;α1,αm;N)τmexp(i=1m(αiα)22τ2τ2θ2π). (10)

3.1.5. Estimation

In Equations (5) and (6), the conditional posterior distributions of α and τ2 given other parameters have a known form, but the conditional posterior distributions of αi(i=1,,m) in Equation (4) do not have known forms. Therefore, the Metropolis-Hastings within Gibbs sampler is used to simulate the Markov chains of αi(i=1,,m), α and τ2. A new value α´i is simulated from a proposal distribution q1(α´i,αi(t1)), which is the normal distribution with mean equalling to current value αi(t1) and the variance is chosen such that an acceptance rate of α´i is between 0.24 and 0.40 [17]. The proposed value is accepted with the probability

r1(αi(t1),α´i)=min{π(α´i|α(t1),τ2(t1))q1(αi(t1),α´i)π(αi(t1)|α(t1),τ2(t1))q1(α´i,αi(t1)),1}. (11)

If the proposed value is rejected, the current value is taken as the next value in the Markov chain.

The uniform prior distribution on τ leads to the conditional posterior distribution of τ as given in Equation (8). This posterior distribution does not have a closed form, therefore, the Metropolis-Hastings sampler is used. The two steps are defined as follows. Firstly, we draw a proposed value, τ´, from the proposal distribution q2(τ´,τ(t1)), which is a normal distribution with mean equal to the current value τ(t1). Secondly, the proposed value is accepted with the probability

r2(τ(t1),τ´)=min{π(τ´|α(t),α1(t),,αm(t))q2(τ(t1),τ´)π(τ(t1)|α(t),α1(t),,αm(t))q2(τ´,τ(t1)),1}. (12)

The conditional posterior distribution for τ in Equation (10) is produced by using a half-normal prior distribution and does not have a closed form. Therefore, the Metropolis-Hastings sampler is used to simulate τ. This sampler includes generating the proposed value τ´ from the proposal distribution q2(τ´,τ(t1)) and the proposed value is accepted with the probability r2(τ(t1),τ´) described in equation (12). The proposal distribution q2(.,τ(t1))) is a normal distribution with current state τ(t1) as the mean.

3.2. Two-level frequentist hierarchical method (Model 2)

In this section, we describe a two-stage approach to modelling the accident data on the UK motorway network using the frequentist method. In stage one, we used the maximum likelihood method to estimate the log-intensity function yi and the corresponding standard deviation σi. In stage two, the estimated log-intensity functions are combined across motorways to produce an overall estimate. The model can be formulated as follows,

yiN(αi,σi2),αiN(α,τ2). (13)

Here yi represents the estimated intensity at the log scale for motorway i, αi represents the random effects for the log-intensity and σi2 is the within-motorway variance corresponding to yi;  α is the overall intensity at the log scale and τ2 represents the between-motorway heterogeneity. The marginal distribution of each estimated log-intensity yi is a normal distribution with mean α and variance (σi2+τ2)1 [19]. Hence, the contribution of motorway i to the likelihood for α and τ2 is given by,

Li(α,τ2|yi,σi2)=12π(σi2+τ2)exp((yiα)22(σi2+τ2)). (14)

For m independent motorways, the likelihood is given by the product of the individual motorway likelihoods as follows,

L(α,τ2|y,σ2)=i=1m12π(σi2+τ2)exp((yiα)22(σi2+τ2)). (15)

3.3. Three-level Bayesian hierarchical model (Model 3)

3.3.1. Model definition

In the three-level hierarchical model, the number of accidents within grouped segments follows a homogeneous process but a non-homogeneous process across grouped segments and motorways. Let m denote the total number of motorways and si(i=1,,m) the number of grouped segments for each motorway i. Suppose that the intensity of accidents per kilometre is λij(i=1,,m;j=1,,si), where i is the index of motorway and j is the index of grouped segments. The number of accidents nij on each grouped segments follows a Poisson distribution with mean λijLij, where Lij represents the length (in kilometre) of the grouped segments j on motorway i. Let αij=logλij denote the log-intensity function. The three-level hierarchical model is given below,

nijPois(λijLij),i=1,,m;j=1,..,si,αijN(αi,τi2),αiN(α,τ2). (16)

The second level includes the log-intensity of accidents, αij, on each grouped segments and the log-intensity of accidents, αi, on each motorway as well as the between grouped segments heterogeneity, τi2. The third level includes the overall log-intensity of accidents α and the between-motorway heterogeneity, τ2. The intensity of accidents is constant on grouped segments that have the same mark, but it varies across grouped segments and motorways.

3.3.2. Likelihood function

Let Θ3 denote model parameters {α11,,αmsm;α1,,αm;τ12,,τm2;α;τ2} in the three-level hierarchical model. Let γ denote the log-intensity of accidents on grouped segments {α11,,αmsm}, α={α1,,αm} and τ2={τ12,,τm2}. It is assumed that the accidents are uniformly distributed within grouped segments.

The likelihood function for the proposed three-level hierarchical model is given by,

L(N|Θ3)=P(N|γ)×P(γ|α,τ2)×P(α|α,τ2)i=1mj=1siexp(nijαijLijexp(αij))×i=1mj=1si12πτi2exp((αijαi)22τi2)×i=1m12πτ2exp((αiα)22τ2). (17)

3.3.3. Posterior distribution

The same prior distributions for α and τ2 as the ones described in Section 3.1.3 are considered.

Posterior if the prior distribution on τ2 is an inverse gamma distribution with shape a0 and rate b0

A joint posterior distribution for parameters Θ3 is given by,

π(Θ3|N)i=1mj=1siexp(nijαijLijexp(αij))×i=1mj=1si12πτi2exp((αijαi)22τi2)×i=1m12πτ2exp((αiα)22τ2)×i=1mb0a0Γ(a0)(τi2)a01exp(b0τi2)×12πσ02exp((αμ0)22σ02)×β0α0Γ(α0)(τ2)α01exp(β0τ2). (18)

The conditional posterior distribution of αij is given by,

π(αij|α,τ2,N)exp(nijαijLijexp(αij))exp((αijαi)22τi2). (19)

The conditional posterior distribution of αi is a normal distribution N(μαi,σαi2) with mean and variance:

μαi=j=1siαijτi2+ατ2siτi2+1τ2andσαi2=1siτi2+1τ2. (20)

The conditional posterior distribution of τi2 is given by,

τi2Inv-Gamma(si2+a0,j=1si(αijαi)22+b0). (21)

The conditional posterior distribution of α is a N(μα,σα2) with mean and variance given below, respectively,

μα=i=1mαiτ2+μ0σ02mτ2+1σ02andσα2=1mτ2+1σ02. (22)

The conditional posterior distribution of τ2 is given by,

τ2InvGamma(m2+α0,i=1m(αiα)22+β0). (23)

Posterior if the prior distribution on τ is a uniform distribution

The probability density function of the uniform prior distribution on τ is constant, so it does not appear in the joint posterior distribution. Hence, the joint posterior distribution is given by,

π(Θ3|N)i=1mj=1siexp(nijαijLijexp(αij))×i=1mj=1si12πτi2exp((αijαi)22τi2)×i=1m12πτ2exp((αiα)22τ2)×i=1mb0a0Γ(a0)(τi2)a01exp(b0τi2)×12πσ02exp((αμ0)22σ02). (24)

The conditional posterior distributions of αij, αi, τi2(i=1,,m;j=1,,si) and α are the same in Equations (19)–(22). The conditional posterior distribution of τ is given by,

π(τ|α,α,N)(12πτ2)mexp(i=1m(αiα)22τ2). (25)

Posterior if the prior distribution on τ is a half-normal (HN) distribution with parameter θ

If we use HN(θ) as the prior for τ, the joint posterior distribution is given by,

π(Θ3|N)i=1mj=1siexp(nijαijLijexp(αij))×i=1mj=1si12πτi2exp((αijαi)22τi2)×i=1m12πτ2exp((αiα)22τ2)×i=1mb0a0Γ(a0)(τi2)a01exp(b0τi2)×12πσ02exp((αμ0)22σ02)×2θπexp(τ2θ2π). (26)

The conditional posterior distributions of αij, αi, τi2,(i=1,,m;j=1,,si) and α are the same in Equations (19)–(22). The conditional posterior distribution of τ is given by,

π(τ|α,α,N)τmexp(i=1m(αiα)22τ2τ2θ2π). (27)

3.3.4. Estimation

Bayesian estimation of the three-level hierarchical model is performed using Metropolis-Hastings within Gibbs sampler. We generate random samples from conditional posterior distributions of α, αi, τ2 and τi2(i=1,,m), respectively. Conditional posterior distributions of αij,(i=1,,m;j=1,,si) are not of closed forms. In this case, the Metropolis-Hasting algorithm is used. Normal proposal distributions are specified for αij,(i=1,,m;j=1,,si) with mean αij(t1) and variance σij2, where t(t=1,,M) is the iteration index. The variance σij2 is chosen such that an acceptance rate is between 0.24 and 0.40 [17].

A value α´ij generated from the proposal distribution q1(α´ij,αij(t1)) is accepted with probability

r1(αij(t1),α´ij)=min{π(α´ij|αi(t1),τi2(t1))q1(αij(t1),α´ij)π(αij(t1)|αi(t1),τi2(t1))q1(α´ij,αij(t1)),1}. (28)

A new value τ´ is generated from a proposal distribution q2(τ´,τ(t1)), which is a normal distribution with mean equalling to current value τ(t1). The new value τ´ is accepted with probability

r2(τ(t1),τ´)=min{π(τ´|α(t),α1(t),,αm(t))q2(τ(t1),τ´)π(τ(t1)|α(t),α1(t),,αm(t))q2(τ´,τ(t1)),1}. (29)

3.4. Three-level frequentist hierarchical method (Model 4)

The maximum likelihood estimation is used in three stages to estimate the model parameters (16).

In stage one, the log-intensity of accidents αij is estimated for each grouped segment where the relevant part of likelihood function in Equation (17) is used,

Lij(αij;N)=exp(nijαijLijexp(αij)). (30)

The log-likelihood function is

ij(αij;N)=nijαijLijexp(αij). (31)

The maximum likelihood estimate (M.L.E.) of αij is given by,

α^ij=lognijlogLij. (32)

To calculate the standard error of α^ij, we use the Fisher information matrix I(α^ij), which is a scalar containing the entry

I(α^ij)=E[H(α^ij)]=E[2ijαij2]=Lijexp(αij), (33)

where H(α^ij) represents the Hessian matrix. The square root of the inverse of the Fisher information scalar is an estimator of the standard error for αij

SE(α^ij)=I1(α^ij)=(Lijexp(αij))12. (34)

In stage two, the log-intensity of accidents αi(i=1,,m) is estimated for each motorway. Here, the point estimates α^ij,(i=1,,m;j=1,,ni) are substituted in the likelihood function in Equation (17). The relevant part of the likelihood function is given by,

Li(α,τ2;γ^)=i=1mj=1si12πτi2exp((α^ijαi)22τi2). (35)

The log-likelihood function is

i(α,τ2;γ^)=i=1m[si2logτi2j=1si(α^ijαi)22τi2]. (36)

The maximum likelihood estimates α^i and τ^i2 are, respectively, given by,

α^i=j=1siα^ijsiandτ^i2=j=1si(α^ijα^i)2si. (37)

To obtain the standard errors of α^i and τ^i2, we use the Fisher information matrix as follows,

I(α^i,τ^i2)=E[H(α^i,τ^i2)]=E[2iαi22iαiτi22iτi2αi2i(τi2)2]=[siτi200si2τi4], (38)

where H(α^i,τ^i2) represents the Hessian matrix. The inverse of the Fisher information matrix is given by,

I1(α^i,τ^i2)=[τi2si002τi4si]. (39)

The standard errors of α^i and τ^i2 are the square root of diagonal elements in (29), and hence are given by,

SE(α^i)=τi2si. (40)
SE(τ^i2)=2τi4si. (41)

In stage three, α and τ2 are estimated by maximising the following likelihood,

L(α,τ2;α^i)=i=1m12πτ2exp{(α^iα)22τ2}. (42)

The estimated values α^i and SE(α^i) are used as data in (42).

4. Application to UK motorway accident data

Non-informative and weakly informative prior distributions

A non-informative prior distribution reflects the lack of prior information about a parameter [30]. A conjugate prior could be non-informative, such as InvGamma(0.001,0.001) or weakly-informative, such as InvGamma(0.1,0.1) for τ2. As a sensitivity analysis, we use the uniform prior unif(0,100) and a half-normal prior distribution HN(0.14) for τ, both are non-informative priors on τ [28,46]. We used InvGamma(0.001,0.001) as prior for τi2(i=1,,m). Finally, a non-informative normal prior distribution was used with mean μ0=0 and variance σ02=100 for α.

Informative prior distributions

An informative prior describes specific pre-existing information about parameter [30]. The maximum likelihood estimates of τ2 and α and their standard errors of traffic accident data from an earlier year (e.g. 2015) will be used to specify the informative priors in the Bayesian analysis of accident data in the subsequent year. More specifically, the parameters for inverse gamma prior are calculated from solving the following equation:

E(τ2)=α0β0=τ^ML2,var(τ2)=α0β02=var(τ^ML2), (43)

where τ^ML2 is the maximum likelihood estimate, 0.3162, and var(τ^ML2) is the variance of τ^ML2, (0.0738)2, both are obtained from analysing the UK motorways accident data in 2015. Solving the equations in (43), we obtain α0=18.36 and β0=58.06. Thus, the informative prior for τ2 is InvGamma(18.36,58.06). Similarly, μ0=6.65 and σ02=0.092.

Results

Both two-level and three-level hierarchical models are used to analyse the observed accident data in 2016 for the UK motorways. Model parameters include the overall log-intensity of traffic accidents α, and the between-motorway standard deviation, τ. The MCMC simulation requires specifying starting points for the parameters. The initial values 0 and 0.1 are specified for α and τ, respectively. The MCMC algorithm was run on the two-level hierarchical model for 100,000 iterations with a burn-in period of 10,000 and a thinning interval of 10; and on the three-level hierarchical model, 500,000 iterations with a burn-in period of 50,000 and a thinning interval of 100. The number of iterations of the MCMC is different between the two-level and three-level hierarchical models to ensure convergence to the posterior distributions.

Table 2 shows that the two-level Bayesian hierarchical model with different priors gives similar estimates of both the parameters α and τ and their standard deviations as well as their credible intervals. This indicates that the estimates of the model parameters are robust to the choice of prior. The estimated α and τ using the maximum likelihood estimation are similar to those from two-level Bayesian hierarchical model. Under the three-level Bayesian hierarchical model, the posterior mean, standard deviation and 95% credible interval for α and τ are similar when non-informative and weakly-informative prior distributions are used. The width of the credible interval of α based on the Bayesian method with an informative prior is narrower than the one estimated with non-informative and weakly-informative priors. In addition, the standard deviation for α^ based on the Bayesian method with informative priors is smaller than that based on the Bayesian method with other priors. The estimated α and τ are similar between the frequentist and the Bayesian approaches, although the standard deviations are, in general, slightly smaller in the frequentist approach (see Table 2), as expected.

Table 2.

Results from the two-level Bayesian hierarchical model, three-level Bayesian hierarchical model, two-level frequentist hierarchical model and three-level frequentist hierarchical model of traffic accidents for 2016.The prior of α is N(0,100). HN represents the half-normal distribution. SD: Standard Deviation and CI: Credible Interval or Confidence Interval.

    Two-level hierarchical model Three-level hierarchical model
Methods Prior distribution Parameter Point estimate SD 95% CI Point estimate SD 95% CI
Bayesian τ2InvGamma(0.001,0.001) α −6.82 0.10 (−7.01, −6.64) −6.93 0.11 (−7.15, −6.71)
    τ 0.60 0.08 (0.47, 0.77) 0.65 0.09 (0.49, 0.85)
  τ2InvGamma(0.1,0.1) α −6.86 0.11 (−7.07, −6.65) −6.93 0.11 (−7.15, −6.71)
    τ 0.70 0.09 (0.54, 0.88) 0.66 0.09 (0.50, 0.85)
  τunif(0,100) α −6.86 0.11 (−7.07, −6.64) −6.93 0.11 (−7.16, −6.71)
    τ 0.70 0.09 (0.55, 0.90) 0.67 0.09 (0.51, 0.87)
  τHN(0.14) α −6.86 0.11 (−7.07, −6.65) −6.93 0.11 (−7.16, −6.71)
    τ 0.70 0.09 (0.55, 0.89) 0.67 0.09 (0.51, 0.87)
  αN(- 6.65,0.092) α −6.62 0.09 (−6.79, −6.43) −6.70 0.08 (−6.86, −6.55)
  τ2InvGamma(18.36,58.06) τ 1.40 0.23 (1.14, 2.08) 1.34 0.11 (1.15, 1.57)
Frequentist   α −6.81 0.10 (−7.00, −6.62) −6.64 0.08 (−6.80, −6.49)
    τ 0.64 0.10 (0.54, 0.89) 0.51 0.06 (0.41, 0.66)

Notes: The prior of α is N(0,100). HN represents the half-normal distribution. SD: Standard deviation and CI: credible interval or confidence interval.

Table 3 shows the estimated overall intensity of accidents per one kilometre. It is clear that results for λ from the three-level Bayesian hierarchical model are similar, except for the informative prior where λ is greater but close to the estimate from the frequentist method. The two-level Bayesian hierarchical model produces similar results under non-informative and weakly-informative priors of τ.

Table 3.

Estimates of overall accident intensity λ per one kilometre for UK motorway network in 2016: λ=exp(α).

    Two-level hierarchical model Three-level hierarchical model
Methods Prior distribution Point estimate 95% CI Point estimate 95% CI
Bayesian τ2InvGamma(0.001,0.001) 1.09 (0.90, 1.31) 0.98 (0.79, 1.22)
  τ2InvGamma(0.1,0.1) 1.05 (0.85, 1.29) 0.98 (0.79, 1.22)
  τunif(0,100) 1.05 (0.85, 1.31) 0.98 (0.78, 1.22)
  τHN(0.14) 1.05 (0.85, 1.29) 0.98 (0.78, 1.22)
  αN(- 6.65,0.092) 1.33 (1.12, 1.61) 1.23 (1.05, 1.43)
  τ2InvGamma(18.36,58.06)        
Frequentist   1.10 (0.91, 1.33) 1.31 (1.12, 1.53)

Notes: The prior of α is N(0, 100). CI: credible interval or confidence interval. HN represents the half-normal distribution.

Figure 2 shows 49 motorways with intensities of traffic accidents per one kilometre and their corresponding credible intervals for both two-level and three-level Bayesian hierarchical models.

Figure 2.

Figure 2.

Results from the two-level Bayesian hierarchical model and the three-level Bayesian hierarchical model for accident data on the UK motorways in 2016. In the two-level Bayesian hierarchical model, the following prior distributions are used, αN(0,100) and τ2InvGamma(0.001,0.001). In the three-level Bayesian hierarchical model, the prior distributions are αN(6.65,0.092) and τ2InvGamma(18.36,58.06). Results include the posterior mean and the corresponding 95% credible interval for the intensity of accidents λi=1000×exp(αi) per one kilometre on each motorway and the overall intensity of accidents λ per one kilometre. Square boxes represent posterior means of λi,(i=1,,m). The diamond represents the estimated overall intensity of accident λ and its 95% credible interval. Horizontal lines denote 95% credible intervals and the sold vertical line represents the posterior mean of the overall intensity λ. (a) Two-level Bayesian hierarchical model. (b) Three-level Bayesian hierarchical model.

Based on the estimation results from Figure 2 of the Bayesian hierarchical models for the UK motorway data, the estimated intensities of accidents on the UK motorway network are classified into five categories. Category one (λ<0.5) is referred to a very low risk; Category two (0.5λ<1) is referred to a low risk; Category three (1λ<2) is referred to a moderate risk; Category four (2λ<3) is referred to a high risk. Finally, category five (λ3) is referred to a very high risk. Based on the results from the Bayeisan three-level hierarchical model, the moderate-risk level represents the general intensity of accidents level of the UK motorway network. Based on the results presented in Figures 2 and 3, motorways: M27, M275 and M32 are at high risk, whereas motorways: M25 and M606 form the highest risk motorways, where the expected number of accidents is above 3 per one kilometre for both motorways. On the other hand, motorways: M9, M90, M58, M45, M48, M49, M180, M54, M74 and M50 have the lowest risk such that the expected number of accidents is lower than 0.5 for these motorways.

Figure 3.

Figure 3.

Estimated intensity of traffic accidents per one kilometre on the UK motorway network in 2016. The intensities are estimated using the three-level Bayesian hierarchical model with prior distributions αN(6.65,0.092) and τ2InvGamma(18.36,58.06).

Based on the results of the two-level Bayesian hierarchical model, Figures 2 and 4 shows that a general level of the intensity of accidents on the UK motorway network is of moderate risk where the moderate risk motorways are M32, M1, M3, M20, M8, M6, M65, M62, M11, M4, M271, M42, M61, M2, M40, M57, M23, M60, M56, M898, M66, M602, M18 and M5. Motorways M25 surrounding almost all of Greater London, England, except North Ockendon, in the United Kingdom and M27 in Hampshire, England, starting west-east from Cadnam to Portsmouth, have a very high-risk level. The expected numbers of accidents are 3.59 per one kilometre of M25 and 3.03 per one kilometre of M27. The motorways M54, M180, M74 and M50 form the lowest risk motorways and their estimated intensities are 4.4, 4.4, 4.2, 3.3 per 10 kilometres. Figure 4, moreover, illustrates that the risk intensity level for motorways M606, M621 and M275 is high and the expected number of accidents is 2.36 per one kilometre of M606 and 2.09 per one kilometre for both M621 and M275.

Figure 4.

Figure 4.

Estimated intensity of traffic accidents per one kilometre on the UK motorway network in 2016. The intensities are estimated using the two-level Bayesian hierarchical model with prior distributions αN(0,100) and τ2InvGamma(0.001,0.001).

5. Simulation study

We conducted a simulation study to assess the performance of the proposed models.

5.1. Simulation design

For the two-level hierarchical model, we considered six scenarios with different true values of parameters α and τ. The true values of α are set to be -1 and −7. If the overall log-intensity α is chosen to be lower than 9, the number of accidents on the motorway will be very close to zero. The between-motorway standard deviation τ is set to be 0.3, 0.8 and 1.5 to reflect the variation between motorways. A magnitude of 0.3 would indicate that there is not much variation in the motorway specific log-intensity while a magnitude of 1.5 would result in much more variation between motorways. These true values are chosen to be close to the results for the observed data set. The log-intensity αi on motorway i(i=1,,m) is drawn from a normal distribution with mean α and standard deviation τ. For the two-level hierarchical model, the number of accidents ni(i=1,,m) on the motorway i is generated from a Poisson distribution with mean Liexp(αi), where Li is the length of the motorway i. We simulated 1,000 data sets for each scenario.

For the three-level hierarchical model, we simulated data according to the following,

αiN(α,τ2),αijN(αi,τi2),nijPois(Lijexp(αij)),i=1,,m;j=1,..,si. (44)

Six different scenarios of simulation are considered with α=5,7 and τ=0.3,0.7,1.5 for the three-level models.

The performance of the proposed models is evaluated by comparing the simulated results with the true values using the following metrics, bias, mean square error (MSE) and coverage probability (CP) [37]. The bias in the parameter estimate represents the difference between the average of the estimates over all simulation and the true value. The mean square error is the squared bias plus the variance of the estimated parameter, for example MSE(α^)=Bias(α^,α)2+Var(α^). The coverage probability is the percentage of 95% credible intervals across the 1000 simulated data sets that contain the true value.

5.2. Simulation results

Table 4 shows that the performance of the two-level Bayesian hierarchical model is better than the two-level frequentist hierarchical model in terms of the bias, MSE and coverage probability. In Table 4, the bias and MSE in α and τ for the two-level frequentist hierarchical model are larger than those for the two-level Bayesian hierarchical model for scenarios with true value α=7. The bias in τ obtained for the two-level Bayesian hierarchical model is insensitive to the choice of priors.

Table 4.

Simulation results of the two-level Bayesian and frequentist hierarchical models (model 1 and 2) under four prior distributions of τ2 and the prior distribution αN(0,102). Time for running the simulation recorded in seconds.

      α=1 α=7
  True τ Parameters Mean Bias MSE CP Time Mean Bias MSE CP Time
InvGamma(0.001,0.001) 0.3 α −0.9982 0.0018 0.0018 94.7% 11,747 −6.9967 0.0033 0.0028 94.8% 11,667
    τ 0.3008 0.0008 0.0009 95.4%   0.2963 −0.0037 0.0021 93.8%  
  0.8 α −0.9929 0.0071 0.0134 95.3% 11,679 −6.9806 0.0194 0.0142 94.7% 11,664
    τ 0.8041 0.0041 0.0066 95.7%   0.7896 −0.0104 0.0083 95.2%  
  1.5 α −0.9923 0.0077 0.0426 96.4% 18,516 −6.9442 0.0558 0.0478 95.0% 11,609
    τ 1.5019 0.0019 0.0238 95.5%   1.4470 −0.0530 0.0270 94.7%  
InvGamma(0.1,0.1) 0.3 α −0.9982 0.0018 0.0018 95.0% 11,660 −6.9980 0.0020 0.0028 95.6% 11,896
    τ 0.3072 0.0072 0.0009 95.5%   0.3097 0.0097 0.0018 95.5%  
  0.8 α −0.9927 0.0073 0.0134 95.2% 11,593 −6.9807 0.0193 0.0141 95.0% 11,691
    τ 0.8051 0.0051 0.0066 96.0%   0.7910 −0.0090 0.0082 95.6%  
  1.5 α −0.9920 0.0080 0.0426 96.1% 11,903 −6.9441 0.0559 0.0478 94.9% 11,683
    τ 1.5005 0.0005 0.0235 95.4%   1.4454 −0.0546 0.0272 94.7%  
unif(0,100) 0.3 α −0.9983 0.0017 0.0018 94.8% 12,492 −6.9975 0.0025 0.0028 95.8% 12,486
    τ 0.3050 0.0050 0.0010 95.0%   0.3044 0.0044 0.0021 94.5%  
  0.8 α −0.9927 0.0073 0.0134 95.5% 12,707 −6.9816 0.0184 0.0141 95.5% 12,432
    τ 0.8131 0.0131 0.0069 95.6%   0.8007 0.0007 0.0083 95.4%  
  1.5 α −0.9923 0.0077 0.0426 96.6% 12,580 −6.9452 0.0548 0.0478 94.9% 12,465
    τ 1.5181 0.0181 0.0246 95.2%   1.4646 −0.0354 0.0261 95.3%  
HN(0.14) 0.3 α −0.9983 0.0017 0.0018 94.9% 17,933 −6.9974 0.0026 0.0028 95.5% 18,073
    τ 0.3050 0.0050 0.0010 94.9%   0.3046 0.0046 0.0021 94.7%  
  0.8 α −0.9929 0.0071 0.0134 95.3% 17,724 −6.9816 0.0184 0.0141 95.2% 17,807
    τ 0.8129 0.0129 0.0069 95.4%   0.8006 0.0006 0.0083 95.4%  
  1.5 α −0.9924 0.0076 0.0426 96.7% 17,767 −6.9454 0.0546 0.0477 94.5% 17,828
    τ 1.5179 0.0179 0.0245 95.5%   1.4645 −0.0355 0.0261 95.2%  
Frequentist method 0.3 α −0.9982 0.0018 0.0018 94.1% 42 −6.9732 0.0268 0.0034 90.4% 42
    τ 0.2956 −0.0044 0.0009 95.6%   0.2850 −0.0150 0.0021 94.7%  
  0.8 α −0.9928 0.0072 0.0134 94.2% 40 −6.9405 0.0595 0.0166 90.5% 39
    τ 0.7902 −0.0098 0.0065 96.0%   0.7535 −0.0465 0.0096 94.9%  
  1.5 α −0.9925 0.0075 0.0426 95.9% 35 −6.8843 0.1157 0.0551 89.5% 37
    τ 1.4915 −0.0238 0.0234 95.4%   1.3723 −0.1277 0.0386 93.8%  

Note: MSE represents mean square error and CP the coverage probability.

Table 5 shows that the three-level Bayesian hierarchical model performs better than the three-level frequentist hierarchical model in terms of the bias, MSE and coverage probability. The three-level frequentist approach produced higher bias for point estimates of α and τ acrosss the six scenarios and a bigger MSE. For scenarios with true value of α=5, the bias of the estimated τ is sensitive to the specification of the prior distribution. For the true value of α=7, the three-level frequentist hierarchical model produced poor coverage probabilities for both the parameters, providing a value of 0 for α. For the true value of α=5, the coverage probabilities are better than those for α=7, but the frequentist approach gave lower coverage probabilities for both parameters. Henderson et al. [20] showed that the separate analysis using the two-stage method is not performing well compared with the one-stage method. Browne and Draper [11] showed that marginal quasi-likelihood method produced the coverage probability as zero for the random effect variance parameter in random-effects logistic regression model. Our simulation results show that Bayesian hierarchical models appeared better than the frequentist hierarchical models in terms of the bias, MSE and coverage probability.

Table 5.

Simulation results of the three-level Bayesian and frequentist hierarchical models (model 3 and 4) under four prior distributions of τ2 and the prior distribution αN(0,102).

      α=5 α=7
  True τ Parameters Mean Bias MSE CP Time Mean Bias MSE CP Time
InvGamma(0.001,0.001) 0.3 α −5.0032 −0.0032 0.0026 94.7% 559,000 −6.9940 0.0060 0.0037 93.8% 258,760
    τ 0.3001 0.0001 0.0019 95.0%   0.2937 −0.0063 0.0034 93.5%  
  0.7 α −5.0035 −0.0035 0.0116 95.6% 549,926 −7.0003 −0.0003 0.0132 95.0% 436,154
    τ 0.7004 0.0004 0.0066 96.2%   0.6941 −0.0059 0.0085 94.5%  
  1.5 α −5.0052 −0.0052 0.0500 94.8% 542,032 −7.0198 −0.0198 0.0592 93.1% 263,827
    τ 1.4937 −0.0063 0.0262 95.4%   1.5048 0.0048 0.0336 94.9%  
InvGamma(0.1,0.1) 0.3 α −5.0037 −0.0037 0.0026 95.2% 584,324 −7.0014 −0.0014 0.0037 95.9% 545,479
    τ 0.3131 0.0131 0.0018 95.8%   0.3164 0.0164 0.0025 96.3%  
  0.7 α −5.0035 −0.0035 0.0116 95.6% 577,621 −6.9944 0.0056 0.0136 94.5% 507,931
    τ 0.7025 0.0025 0.0065 96.1%   0.6946 −0.0054 0.0085 94.2%  
  1.5 α −5.0052 −0.0052 0.0501 94.5% 571,022 −7.0187 −0.0187 0.0536 95.2% 569,115
    τ 1.4920 −0.0080 0.0260 95.2%   1.5032 0.0032 0.0340 94.2%  
unif(0,100) 0.3 α −5.0035 −0.0035 0.0026 95.0% 545,730 −7.0007 −0.0007 0.0037 95.2% 736,853
    τ 0.3082 0.0082 0.0020 94.9%   0.3063 0.0063 0.0033 94.4%  
  0.7 α −5.0041 −0.0041 0.0116 95.6% 533,863 −6.9957 0.0043 0.0137 94.8% 555,450
    τ 0.7111 0.0111 0.0068 96.4%   0.7048 0.0048 0.0089 94.3%  
  1.5 α −5.0062 −0.0062 0.0504 95.5% 580,176 −7.0217 −0.0217 0.0535 95.5% 537,030
    τ 1.5133 0.0133 0.0271 95.8%   1.5287 0.0287 0.0361 93.9%  
HN(0.14) 0.3 α −5.0035 −0.0035 0.0026 95.2% 574,984 −7.0004 −0.0004 0.0037 95.2% 548,584
    τ 0.3082 0.0082 0.0020 95.0%   0.3064 0.0064 0.0033 94.1%  
  0.7 α −5.0041 −0.0041 0.0116 95.6% 584,604 −6.9957 0.0043 0.0136 94.7% 497,680
    τ 0.7111 0.0111 0.0068 96.3%   0.7050 0.0050 0.0089 94.5%  
  1.5 α −5.0061 −0.0061 0.0504 95.5% 546,481 −7.0216 −0.0216 0.0535 95.5% 583,913
    τ 1.5132 0.0132 0.0270 95.7%   1.5286 0.0286 0.0362 93.9%  
Frequentist method 0.3 α −4.9305 0.0695 0.0073 72.4% 537 −6.5961 0.4039 0.1650 0% 479
    τ 0.3082 0.0082 0.0027 81.7%   0.2650 −0.0350 0.0037 89.8%  
  0.7 α −4.9166 0.0834 0.0169 86.0% 528 −6.5297 0.4703 0.2271 0% 547
    τ 0.6602 −0.0398 0.0072 94.1%   0.4942 −0.2058 0.0472 27.1%  
  1.5 α −4.8524 0.1476 0.0593 88.0% 514 −6.3587 0.6413 0.4315 0% 549
    τ 1.3102 −0.1898 0.0532 84.7%   0.9650 −0.5350 0.3012 4.2%  

Notes: Time for running the simulation is recorded in seconds. MSE represents mean square error and CP the coverage probability.

6. Model comparisons

6.1. Model comparisons using information criteria

We compare the Bayesian models using the deviance information criterion (DIC) [42] and the Watanabe-Akaike or widely applicable information criterion (WAIC) [47].

Table 6 shows that DIC and WAIC for the three-level Bayesian hierarchical model across different priors are lower than those for two-level Bayesian hierarchical model. This indicates that the three-level Bayesian hierarchical model provides a better fit to the observed data compared with the two-level Bayesian hierarchical model.

Table 6.

DIC and WAIC: 2LBHM represents the two-level Bayesian hierarchical model and 3LBHM represents the three-level Bayesian hierarchical model.

    Prior distribution
Model Criterion Gamma(0.001,0.001) Gamma(0.1,0.1) HN(0.14) unif(0,100) Gamma(18.36,58.06)
2LBHM DIC 100,408.3 71,999.9 71,825.6 71,950.6 7,1971.7
  WAIC 89,412.6 68,222.7 68,135.6 68,210.9 68,223.2
3LBHM DIC 85,204.3 71,683.3 71,671.0 71,671.0 71,700.3
  WAIC 1049.9 888.3 895.7 20,271.2 891.3

6.2. Model comparisons using simulation study

Simulation Design

The term ‘misspecification’ means fitting a wrong model to the data [48]. Model misspecification affects estimation and produces biased estimates. To investigate the effects of model misspecification, we fitted a two-level Bayesian hierarchical model (1) to the same data sets that simulated in Section 5.1 via a three-level Bayesian hierarchical model (16). We provide posterior mean, bias, mean square error and coverage probability to investigate whether model (1) is able to analyse data in presence of between grouped segments heterogeneity. The same prior distribution in section 3.1.3 and the same initial values in section 4 are utilized in the Bayesian analysis, and 100,000 iterations were run with a burn-in period of 10,000 and a thinning interval of 10 to obtain posterior samples for α and τ.

Simulation Results

Tables 7 shows that the two-level Bayesian hierarchical model produced biased estimates, with large mean square errors and extremely poor coverage probabilities for both the model parameters. The coverage probabilities were 0 or close to 0 for τ and exactly equal or close to 100% for α when the true value for the between-motorway standard deviation was τ=0.3and0.7. This indicates that the fitted model, two-level hierarchical Bayesian model, is incorrect, when the underlying model has three levels.

Table 7.

Simulation results from two-level Bayesian hierarchical model under four prior distributions of τ2 and the prior distribution αN(0,102). Time for running the simulation is recorded in seconds.

      α=5 α=7
  True τ Parameters Mean Bias MSE CP Time Mean Bias MSE CP Time
InvGamma(0.001,0.001) 0.3 α −4.8199 0.1801 0.0387 100% 20,147 −6.8121 0.1879 0.0437 99.9% 19,592
    τ 1.6270 1.3270 1.7673 0%   1.6670 1.3670 1.8768 0%  
  0.7 α −4.8165 0.1835 0.0479 99.2% 17,955 −6.8004 0.1996 0.0576 98.7% 21,103
    τ 1.7362 1.0362 1.0868 0%   1.7856 1.0856 1.1946 0%  
  1.5 α −4.8167 0.1833 0.0841 96.6% 19,410 −6.822 0.178 0.0868 97.0% 19,959
    τ 2.188 0.688 0.5142 0.37%   2.2409 0.7409 0.5989 0.34%  
InvGamma(0.1,0.1) 0.3 α −4.8197 0.1803 0.0387 100% 19,623 −6.8121 0.1879 0.0438 99.9% 20,937
    τ 1.6245 1.3245 1.7607 0%   1.6645 1.3645 1.8699 0%  
  0.7 α −4.8167 0.1833 0.0478 99.6% 18,536 −6.8002 0.1998 0.0577 98.4% 20,114
    τ 1.734 1.034 1.0821 0%   1.7829 1.0829 1.1887 0%  
  1.5 α −4.8163 0.1837 0.084 96.2% 17,885 −6.8219 0.1781 0.0865 97.1% 19,965
    τ 2.1836 0.6836 0.5079 03.7%   2.2368 0.7368 0.5926 0.35%  
unif(0,100) 0.3 α −4.8200 0.1800 0.0388 100% 19,398 −6.8129 0.1871 0.0435 100% 21,003
    τ 1.6452 1.3452 1.816 0%   1.6869 1.3869 1.9318 0%  
  0.7 α −4.8168 0.1832 0.0478 99.4% 19,788 −6.8014 0.1986 0.0573 99.0% 22,654
    τ 1.7562 1.0562 1.1289 0%   1.8075 1.1075 1.2430 0%  
  1.5 α −4.8169 0.1831 0.0837 96.7% 22,420 −6.8243 0.1757 0.086 97.4% 20,769
    τ 2.2136 0.7136 0.5511 0.31%   2.2696 0.7696 0.6437 03.0%  
HN(0.14) 0.3 α −4.8195 0.1805 0.0389 100% 23,911 −6.8129 0.1871 0.0435 100% 24,285
    τ 1.6455 1.3455 1.8169 0%   1.6869 1.3869 1.9319 0%  
  0.7 α −4.8167 0.1833 0.0478 99.5% 24,054 −6.8014 0.1986 0.0573 99.0% 24,808
    τ 1.7561 1.0561 1.1287 0%   1.8075 1.1075 1.2429 0%  
  1.5 α −4.8167 0.1833 0.0839 96.5% 24,214 −6.8243 0.1757 0.0860 97.4% 24,460
    τ 2.2134 0.7134 0.5509 0.32%   2.2697 0.7697 0.6438 3.0%  

Note: MSE represents mean square error and CP the coverage probability.

7. Discussion and conclusions

This study focused on Bayesian hierarchical models for analysing road accidents on the UK motorway network. This work helps to identify the most dangerous motorways in the UK network based on the estimated intensity of traffic accidents. These models have not been used for the UK motorway network before. We modelled the accident data at the motorway level by proposing a two-level hierarchical model to take into account the heterogeneity across motorways. We proposed a three-level hierarchical model to incorporate the heterogeneity not only across motorways but also across grouped segments. We assumed accident intensities are homogeneous within grouped segments but heterogeneous across grouped segments. Using the proposed hierarchical models, we identified motorways with highest and lowest intensities of accidents, classified motorway into different risk categories, and estimated the overall intensity of accidents.

We used both Bayesian and frequentist approaches to estimate the model parameters. In the Bayesian approach, a sensitivity analysis with different prior distributions for τ2 has been performed to investigate the effect of the prior choice on the resulting posterior distributions of α and τ. We have used non-informative, weakly informative and informative priors. In the frequentist approach, the maximum likelihood method has been separately used for each level of the model.

We assessed the performance of all proposed models through a simulation study as well as a real application related to the traffic accident data on the UK motorway network in 2016. In the simulation study, different scenarios were explored. We examined three performance criteria, bias, mean square error (MSE) and coverage probability (CP) of parameter estimates. The two parameters α and τ represent the overall accident intensity and between-motorway heterogeneity, respectively. We model all motorways simultaneously using hierarchical models. The performance of different levels of hierarchical models is evaluated through the estimation of these two parameters. The simulation results showed that the performance of the two-level Bayesian hierarchical model is better than the two-level frequentist hierarchical model in terms of the bias and the coverage probability for some simulation scenarios. The performance of both models is similar in terms of mean square errors.

In the real application, the findings from two-level Bayesian hierarchical analysis suggest that three motorways with the highest intensity of traffic accidents are M25, M27 and M606. The M25 has the highest intensity of traffic accidents, where the expected number of accidents is 3.59 per one kilometre. The M27 has the second highest intensity of traffic accidents with the expected number of accidents 3.03 per one kilometre. The M606 is the third most dangerous motorway with the expected number of accidents 2.36 per one kilometre. The three motorways with the lowest intensity of accidents are M50, M74 and M180. The lowest intensity of accidents is the M50 with an expected number of accidents of 3.32 per 10 kilometres. The second lowest intensity of traffic accidents is the M74 with an expected number of accidents of 4.18 per 10 kilometres. The M180 has the third lowest intensity of accidents with an expected number of accidents of 4.35 per 10 kilometres.

The simulation results showed that the three-level Bayesian hierarchical model performed better than the three-level frequentist model in most of the simulation scenarios. The frequentist method failed to attain the required level of actual coverage in some scenarios because of the large bias in the estimates of the overall log-intensity of accidents and the between-motorway standard deviation.

The results of the analysis of the real data using the three-level Bayesian hierarchical model showed that the motorway has the highest intensity of accidents is M25, where the expected number of accidents is 3.12 per one kilometre. The second highest intensity is found on M606 with λ=3.09 per one kilometre. The third highest intensity is found on M27 with an expected number of accidents of 2.69 per one kilometre. In contrast, M50, M74 and M49 have the lowest intensity of accidents with the expected numbers of accidents being 1.7, 2.7 and 3.0 per 10 kilometres, respectively. Some motorways have the similar intensity of accidents, for example, both M2 and M11 have λ=1.39 per one kilmeter, both M53 and M55 have λ=9.5 per 10 kilometres, both M73 and M77 have λ=5.8 per 10 kilometres and both M54 and M180 have λ=3.2 per 10 kilometres.

Information criteria (DIC and WAIC) and a simulation study were used to compare between the two-level and three-level Bayesian hierarchical models. The values of DIC and WAIC for the three-level hierarchical model are lower than those for the two-level hierarchical model. This indicates that the three-level Bayesian hierarchical model fits the data better.

Future research can be conducted to investigate the overdispersion of the accident data by employing alternative models, such as negative binomial models [12], zero-inflated Poisson models [33] and Extra-Poisson variation models [10]. The relaxation of the assumption of homogenous accident intensity within grouped segments can be explored to allow for the variation of the accident intensity within grouped segments. The extension could also include incorporating the heterogeneity arising from unobserved factors [3,9,15,38], and spatial correlations at intersections [6,22,32]. In our three-level hierarchical models, we allow parameters of accident intensity to vary across grouped segments. Future research can investigate multivariate random parameters models to account for the spatial correlations [6,8,22,32] and temporal correlations [35], as well as correlations between different types of crash [3]. The incorporation of spatial correlations in crash models has led to improved precision parameter estimates in some data sets [6], and this can be tested for the UK motorway accident data, by taking into account the spatial correlations between grouped segments. The multivariate random parameters models can also be extended to account for the correlations between explanatory variables to further improve the precision of the parameter estimates, due to the possibility of further capturing the underlying unobserved heterogeneity [15]. Variables such as geometric design, traffic conditions, environmental conditions and other variables affecting the accident occurrence in a motorway can be investigated. Models can be developed to include explanatory variables that may attribute to unobserved heterogeneity arising from various sources such as unobserved vehicle characteristics [7], driver characteristics [14,32], roadway attributes, and environmental factors [9].

Funding Statement

This work was supported by the University of Plymouth Collaborative Seed Grant on Big Data Research [Wei, 2017]. The first author was supported by a studentship from the Ministry of Higher Education and Scientific Research, Iraq [Ref: 5672], to undertake a PhD research project at the University of Plymouth.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • 1.Alarifi S.A., A bayesian multivariate hierarchical spatial joint model for predicting crash counts by crash type at intersections and segments along corridors, Accid. Anal. Prev. 119 (2018), pp. 263–273. [DOI] [PubMed] [Google Scholar]
  • 2.Alarifi S.A., Abdel-Aty M.A., Lee J., and Park J., Crash modeling for intersections and segments along corridors: a Bayesian multilevel joint model with random parameters, Anal. Meth. Accid. Res. 16 (2017), pp. 48–59. [Google Scholar]
  • 3.Anastasopoulos P.C., Random parameters multivariate tobit and zero-inflated count data models: addressing unobserved and zero-state heterogeneity in accident injury-severity rate and frequency analysis, Anal. Meth. Accid. Res. 11 (2016), pp. 17–32. [Google Scholar]
  • 4.Ang Q.W., Baddeley A., and Nair G., Geometrically corrected second order analysis of events on a linear network, with applications to ecology and criminology, Scand. J. Stat. 39 (2012), pp. 591–617. [Google Scholar]
  • 5.Baddeley A., Rubak E., and Turner R., Spatial Point Patterns: Methodology and Applications with R, CRC Press, Boca Raton, 2015. [Google Scholar]
  • 6.Barua S., El-Basyouny K., and Islam M.T., Effects of spatial correlation in random parameters collision count-data models, Anal. Meth. Accid. Res. 5 (2015), pp. 28–42. [Google Scholar]
  • 7.Behnood A. and Mannering F., Determinants of bicyclist injury severities in bicycle-vehicle crashes: a random parameters approach with heterogeneity in means and variances, Anal. Meth. Accid. Res. 16 (2017), pp. 35–47. [Google Scholar]
  • 8.Bhat C.R., Astroza S., and Lavieri P.S., A new spatial and flexible multivariate random-coefficients model for the analysis of pedestrian injury counts by severity level, Anal. Meth. Accid. Res. 16 (2017), pp. 1–22. [Google Scholar]
  • 9.Bhowmik T., Yasmin S., and Eluru N., A multilevel generalized ordered probit fractional split model for analyzing vehicle speed, Anal. Meth. Accid. Res. 21 (2019), pp. 13–31. [Google Scholar]
  • 10.Breslow N.E., Extra-poisson variation in log-linear models, J. Royal Stat. Soc.: Ser. C (Appl. Stat.) 33 (1984), pp. 38–44. [Google Scholar]
  • 11.Browne W.J., Draper D., A comparison of Bayesian and likelihood-based methods for fitting multilevel models, Bayesian Anal. 1 (2006), pp. 473–514. [Google Scholar]
  • 12.Chin H.C. and Quddus M.A., Applying the random effect negative binomial model to examine traffic accident occurrence at signalized intersections, Accid. Anal. Prev. 35 (2003), pp. 253–259. [DOI] [PubMed] [Google Scholar]
  • 13.Fountas G., Anastasopoulos P.C., and Mannering F.L., Analysis of vehicle accident-injury severities: a comparison of segment-versus accident-based latent class ordered probit models with class-probability functions, Anal. Meth. Accid. Res. 18 (2018), pp. 15–32. [Google Scholar]
  • 14.Fountas G., Pantangi S.S., Hulme K.F., and Anastasopoulos P.C., The effects of driver fatigue, gender, and distracted driving on perceived and observed aggressive driving behavior: a correlated grouped random parameters bivariate probit approach, Anal. Meth. Accid. Res. 22 (2019), pp. 330–340. [Google Scholar]
  • 15.Fountas G., Sarwar M.T., Anastasopoulos P.C., Blatt A., and Majka K., Analysis of stationary and dynamic factors affecting highway accident occurrence: a dynamic correlated grouped random parameters binary logit approach, Accid. Anal. Prev. 113 (2018), pp. 330–340. [DOI] [PubMed] [Google Scholar]
  • 16.Gelman A. and Hill J., Data Analysis Using Regression and Multilevel/Hierarchical Models, Cambridge University Press, Cambridge, 2007. [Google Scholar]
  • 17.Gelman A., Roberts G.O., and Gilks W.R., et al. Efficient metropolis jumping rules, Bayesian Stat. 5 (1996), pp. 599–608. [Google Scholar]
  • 18.Haque M.M., Chin H.C., and Huang H., Applying bayesian hierarchical models to examine motorcycle crashes at signalized intersections, Accid. Anal. Prev. 42 (2010), pp. 203–212. [DOI] [PubMed] [Google Scholar]
  • 19.Hardy R.J. and Thompson S.G., A likelihood approach to meta-analysis with random effects, Stat. Med. 15 (1996), pp. 619–629. [DOI] [PubMed] [Google Scholar]
  • 20.Henderson R., Diggle P., and Dobson A., Joint modelling of longitudinal measurements and event time data, Biostatistics 1 (2000), pp. 465–480. [DOI] [PubMed] [Google Scholar]
  • 21.Huang H. and Abdel-Aty M., Multilevel data and bayesian analysis in traffic safety, Accid. Anal. Prev. 42 (2010), pp. 1556–1565. [DOI] [PubMed] [Google Scholar]
  • 22.Huang H., Chang F., Zhou H., and Lee J., Modeling unobserved heterogeneity for zonal crash frequencies: A bayesian multivariate random-parameters model with mixture components for spatially correlated data, Anal. Meth. Accid. Res. 24 (2019), pp. 100105. [Google Scholar]
  • 23.Huang H., Chin H.C., and Haque M.M., Severity of driver injury and vehicle damage in traffic crashes at intersections: a bayesian hierarchical analysis, Accid. Anal. Prev. 40 (2008), pp. 45–54. [DOI] [PubMed] [Google Scholar]
  • 24.Huang H., Chin H., and Haque M., Empirical evaluation of alternative approaches in identifying crash hot spots: naive ranking, empirical bayes, and full bayes methods, Transport. Res. Record: J. Transport. Res. Board 2103 (2009), pp. 32–41. [Google Scholar]
  • 25.Jones A.P. and Jørgensen S.H., The use of multilevel models for the prediction of road accident outcomes, Accid. Anal. Prev. 35 (2003), pp. 59–69. [DOI] [PubMed] [Google Scholar]
  • 26.Kim D.-G., Lee Y., Washington S., and Choi K., Modeling crash outcome probabilities at rural intersections: application of hierarchical binomial logistic models, Accid. Anal. Prev. 39 (2007), pp. 125–134. [DOI] [PubMed] [Google Scholar]
  • 27.Klaus B., Strimmer K., and Strimmer M.K., Package ‘fdrtool’. CRAN. (2015) Available at http://www.debian.or.jp/pub/CRAN/web/packages/fdrtool/fdrtool.pdf. Accessed October 13, 2016.
  • 28.Lambert P.C., Sutton A.J., Burton P.R., Abrams K.R., and Jones D.R., How vague is vague? a simulation study of the impact of the use of vague prior distributions in MCMC using winbugs, Stat. Med. 24 (2005), pp. 2401–2428. [DOI] [PubMed] [Google Scholar]
  • 29.Lenguerrand E., Martin J.L., and Laumon B., Modelling the hierarchical structure of road crash data-application to severity analysis, Accid. Anal. Prev. 38 (2006), pp. 43–53. [DOI] [PubMed] [Google Scholar]
  • 30.Lesaffre E. and Lawson A.B., Bayesian Biostatistics, John Wiley & Sons, Chichester, 2012. [Google Scholar]
  • 31.Li W., Carriquiry A., Pawlovich M., and Welch T., The choice of statistical models in road safety countermeasure effectiveness studies in iowa, Accid. Anal. Prev. 40 (2008), pp. 1531–1542. [DOI] [PubMed] [Google Scholar]
  • 32.Li Z., Chen X., Ci Y., Chen C., and Zhang G., A hierarchical bayesian spatiotemporal random parameters approach for alcohol/drug impaired-driving crash frequency analysis, Anal. Meth. Accid. Res. 21 (2019), pp. 44–61. [Google Scholar]
  • 33.Lukusa M.T. and Phoa F.K.H., A Horvitz-type estimation on incomplete traffic accident data analyzed via a zero-inflated poisson model, Accid. Anal. Prev. 134 (2020), pp. 105235. [DOI] [PubMed] [Google Scholar]
  • 34.MacNab Y.C., A Bayesian hierarchical model for accident and injury surveillance, Accid. Anal. Prev. 35 (2003), pp. 91–102. [DOI] [PubMed] [Google Scholar]
  • 35.Mannering F., Temporal instability and the analysis of highway accident data, Anal. Meth. Accid. Res. 17 (2018), pp. 1–13. [Google Scholar]
  • 36.Mitra S. and Washington S., On the nature of over-dispersion in motor vehicle crash prediction models, Accid. Anal. Prev. 39 (2007), pp. 459–468. [DOI] [PubMed] [Google Scholar]
  • 37.Morris T., White I., and C. MJ, Using simulation studies to evaluate statistical methods, Stat. Med. 38 (2019), pp. 2074–2102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Pantangi S.S., Fountas G., Sarwar M.T., Anastasopoulos P.C., Blatt A., Majka K., Pierowicz J., and Mohan S.B., A preliminary investigation of the effectiveness of high visibility enforcement programs using naturalistic driving study data: a grouped random parameters approach, Anal. Meth. Accid. Res. 21 (2019), pp. 1–12. [Google Scholar]
  • 39.Quddus M.A., Modelling area-wide count outcomes with spatial correlation and heterogeneity: an analysis of London crash data, Accid. Anal. Prev. 40 (2008), pp. 1486–1497. [DOI] [PubMed] [Google Scholar]
  • 40.Rongjie Y. and Abdel-Aty M., Using hierarchical bayesian binary probit models to analyze crash injury severity on high speed facilities with real-time traffic data, Accid. Anal. Prev. 62 (2014), pp. 161–167. [DOI] [PubMed] [Google Scholar]
  • 41.Shankar V., Albin R., Milton J., and Mannering F., Evaluating median crossover likelihoods with clustered accident counts: an empirical inquiry using the random effects negative binomial model, Transport. Res. Record: J. Transpor. Res. Board 1635 (1998), pp. 44–48. [Google Scholar]
  • 42.Spiegelhalter D.J., Best N.G., Carlin B.P., and Van Der Linde A., Bayesian measures of model complexity and fit, J. Royal Stat. Soc.: Ser. B (Stat. Methodol.) 64 (2002), pp. 583–639. [Google Scholar]
  • 43.The Department for Transport . Available at https://data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-safety-data
  • 44.The Department for Transport . Available at http://data.dft.gov.uk.s3.amazonaws.com/road-traffic/all-traffic-data-metadata.pdf
  • 45.The Department for Transport . Available at https://roadtraffic.dft.gov.uk/about
  • 46.Thompson S.G., Smith T.C., and Sharp S.J., Investigating underlying risk as a source of heterogeneity in meta-analysis, Stat. Med. 16 (1997), pp. 2741–2758. [DOI] [PubMed] [Google Scholar]
  • 47.Watanabe S., Asymptotic equivalence of bayes cross validation and widely applicable information criterion in singular learning theory, J. Mach. Learn. Res. 11 (2010 Dec), pp. 3571–3594. [Google Scholar]
  • 48.Yoo W. and Slate E.H., A simulation study of a bayesian hierarchical changepoint model with covariates. Technical report, Center for Applied Mathematics and Statistics, New Jersey Institute of Technology 2005. Google Scholar, 2005.

Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES