Summary
The introduction of spatial and temporal frailty parameters in survival models furnishes a way to represent unmeasured confounding in the outcome of interest. Using a Bayesian accelerated failure time model, we are able to flexibly explore a wide range of spatial and temporal options for structuring frailties as well as examine the benefits of using these different structures in certain settings. A setting of particular interest for this work involved using temporal frailties to capture the impact of events of interest on breast cancer survival. Our results suggest that it is important to include these temporal frailties when there is a true temporal structure to the outcome and including them when a true temporal structure is absent does not sacrifice model fit. Additionally, the frailties are able to correctly recover the truth imposed on simulated data without affecting the fixed effect estimates. In the case study involving Louisiana breast cancer-specific mortality, the temporal frailty played an important role in representing the unmeasured confounding related to improvements in knowledge, education, and disease screenings as well as the impacts of Hurricane Katrina and the passing of the Affordable Care Act. In conclusion, the incorporation of temporal, in addition to spatial, frailties in survival analysis can lead to better fitting models and improved inference by representing both spatially and temporally varying unmeasured risk factors and confounding that could impact survival. Specifically, we successfully estimated changes in survival around the time of events of interest.
Keywords: Accelerated failure time, Breast cancer, Event impact, Survival, Spatio-temporal
1. Introduction
Disease mapping by way of spatial frailty models has become commonplace in survival analysis (Banerjee and others, 2003; Bastos and Gamerman, 2006; Henderson and others, 2002; Li and Ryan, 2002; Silva and Amaral-Turkman, 2005). The introduction of spatial frailty terms into these models offers a way to represent unmeasured risk factors with geographic structure that are related to the outcome of interest in the survival regression model. Typical examples of these unmeasured risk factors include: health disparities, access to care, or environmental exposure. Beyond spatially varying unmeasured risk factors, there could also be temporally or spatio-temporally varying unmeasured risk factors. The case study explored here was a motivating example that involved breast cancer-specific (BrCa) mortality in Louisiana during the years 2000 to 2013. In addition to a general increase in BrCa related knowledge, education, and disease screenings, two notable events, Hurricane Katrina (HK, August 2005) and the passing of the Affordable Care Act (ACA, March 2010), occurred during this time, and we believed they had the ability to impact the survival of the women. It was our belief that a hurricane could cause women to have delayed breast cancer diagnosis, less access to care, and/or worse quality of care, among other things, following the storm; beyond that, we felt that this event could impact coastal regions more severely. Additionally, the Affordable Care Act offered insurance to those that might have remained uninsured in other circumstances and potentially led to earlier BrCa diagnoses and better care. Previous work demonstrated the benefits of including spatial frailty terms for the modeling of these data (Carroll and others, 2017). Incorporating temporal and spatio-temporal random effects in other disease mapping areas has proved to be advantageous (Batista and Antn, 2013; Carroll and others, 2016; Lawson and others, 2017; Li and others, 2012; Waller and others, 1997), and we believed that it also had a place in survival analysis.
There has been some previous work performed in the realm of spatio-temporal survival analysis. Two case studies offered some examples for employing spatio-temporal frailties in a semiparametric Cox model (Banerjee and Carlin, 2003; Banerjee and others, 2003). While these case studies illustrated the importance of including spatio-temporal frailties, without a simulation study, it was difficult to assess the performance of these frailties. Onicescu and others (2017) also offered a complex spatio-temporal structure that assumed a dependency between space and time for an accelerated failure time (AFT) model. This complex methodology offered flexibility; however, the spatial and temporal components were not easily separated for interpretation. Additionally, a dependence between space and time might not always be necessary.
The methodology employed in this exploration combined and extended the previously executed spatio-temporal survival analyses. First, our model was a Bayesian AFT model (Christensen and Johnson, 1988) that assumed a standard logistic distribution for the error term. This model has recently increased in popularity as it offers advantages compared to the commonly used Cox proportional hazards model in that it is not necessary to assume proportional hazards and the fixed and random effects have a direct relationship with the logarithm of time (Onicescu and others, 2017; Orbe and others, 2002; Zhang and Lawson, 2011). Second, the linear predictors of interest incorporated fixed effects as well as flexible frailty structures such as spatial frailties alone, additive spatial and temporal frailties, or spatio-temporal interaction frailties. These linear predictor definitions could have been ideal in any given setting; however, we wished to explore the effectiveness of temporally dependent parameters for survival data and little exploration has occurred in this area thus far, particularly related to one or more events that have the potential to alter the survival experience. These methods were tested via a simulation study accompanied by a real data case study.
2. Statistical methods
We considered disease mapping for
predefined small areas across
units of time. For subject
in small area
and time unit
, we observed
, where
was the survival time such that
was the true survival time,
was the censoring time, and
was the censoring indicator. The time scale associated with subscript
differed from the time scale of the survival outcome. The primary survival time
was such that an individual had their own time zero, e.g., time since diagnosis for the case study, while the
time units could be defined as calendar time to capture yearly trends or influence of a major event, e.g., following HK or the passing of the ACA in the motivating example. Thus, the AFT model was expressed as follows for the
individual in spatial area
and temporal unit
:
![]() |
(2.1) |
where
was individual survival time,
was a linear predictor that incorporated the fixed and random effects,
were random errors, and
was a scale parameter. The common distributional assumptions for the errors include: Weibull, standard logistic, and standard normal. Here, we assumed a logistic distribution as it did not require the proportional hazards assumption, which was a major advantage of the AFT, and it offered a closed form expression of the survival and hazard functions (Collett, 2013). The proposed method applies to all alternative distributional assumptions. For the scale parameter
, we assumed a flat, uniform prior distribution that ranged from 0.01 to 10 (Christensen and Johnson, 1988), which is a common choice for this scale parameter. Additionally, a sensitivity analysis presented in the supplementary material available at Biostatistics online was performed and suggested that different error distribution assumptions lead to nearly identical posterior estimates, particularly for appropriately specified models. The corresponding survival and density can be expressed as follows:
![]() |
where
by rearranging equation 2.1 above. Following this, the likelihood was defined using:
as a vector containing all
survival times,
as a matrix containing individual level covariate, spatial, and temporal information, and
a vector containing all model parameters. This likelihood assumed conditional independence and was written as follows:
![]() |
(2.2) |
2.1. Spatial frailty models
When we considered spatial frailties alone, the definition of the AFT model only needed to be parameterized for individual
in spatial area
, constant across time unit
, such that
,
, was defined as:
![]() |
where
represented the fixed effect parameter estimates associated with individual known, important risk factors and
was the spatial frailty term that represented the difference between means of a specific spatial unit and the population. The prior distribution assumed for each of the
,
fixed effect parameter estimates were assumed independent and such that
. The spatial frailty term
could be defined as either uncorrelated (
) or a convolution of uncorrelated and correlated (
) heterogeneity. The uncorrelated frailty,
, was defined with
prior distribution, which allowed spatial units to be independent conditional on
. Alternatively, the correlated frailty,
, was defined by a conditional autoregressive (CAR) prior such that
(Besag and Green, 1993), where for the correlated frailty,
is the number of first degree neighbors for parish
; thus, a parish
is only directly related to its nearest
neighbors. Finally, the spatial frailty precisions (
) and all precision parameters henceforth were defined by the following general uninformative prior distribution:
. A sensitivity analysis displayed in the supplementary material available at Biostatistics online suggested that the results were robust to different prior assumptions. We denoted this spatial frailty only model as
.
2.2. Additive spatial and temporal frailty models
Next, we extended the spatial frailty only model
to incorporate temporal variations. There were multiple avenues for incorporating a temporal structure within the frailty terms. The first we examined was an additive form of spatial and temporal frailties; these additive models assumed that the spatial variation was constant across time and the temporal variation constant across space. So,
was defined as
where
,
were defined as in Section 2.1 and
was the temporal frailty term. Based on our motivating example, we considered two decompositions of the temporal frailty
:
, where
for
with
as some meaningful temporal unit across the study period (e.g. by calendar year to represent the advances in screening and BrCa knowledge from one year to the next) and
, where
for
and
such that
was as in
and
denoted the time unit defined by change points based on the effects of influential events within the study period. As discussed before, these events could range from natural disasters to government legislation to changes in treatment procedures, where the first two were considered in our real data case study.
We could define change points to allow the effect of events to start immediately or have a lagged start. The lag time choice was especially important for natural disasters, e.g., allowing the effect related to HK begin one or more years following the storm. Individuals had
(
) if they were diagnosed prior to the change point for the first event of interest,
if they were diagnosed between the change points for the first and second events, up to
if they were diagnosed after the change point for the last event. When there are multiple events with overlapping time windows of impact, we can similarly define the change points to reflect the impact of all events.
For the event-related parameter, many different specifications could arise. The first specification that we considered was called “constant;” for this definition, there was a change point related to each event, of which there could be one or more. This definition was such that all those who entered the study following the change point for a given event were considered impacted by that event. As an example using HK, the change point could be September 2005; thus, women diagnosed before September 2005 had
and those diagnosed after had
. The next specification that we explored was called “jump and return” wherein two change points and an estimate defined a window of time for the event to have an impact. Using HK as an example, the window of impact for this event could be September 2006 to August 2008 such that women diagnosed between these time points had
. All other women had
. Finally, a “trend” option represented the progression of survival experience following an event. Here, the trend had several change points with a new estimate that corresponded to each. For an example related to HK, the trend parameter could produce four estimates corresponding to women who were diagnosed in the first year (September 2005–August 2006), second year (September 2006–August 2007), third year (September 2007–August 2008), and then the rest of the study time (September 2009 and later) following the storm. The trend frailty, along with
, was used for identifying the appropriate event-frailty lag time and change points for the constant and jump and return options. For this manuscript, we predefined these change points to reflect our beliefs about the influential events.
A random walk prior distribution was used to represent the non-event-defined temporal frailty parameter such that
with
. This random walk structure imposed an ideal temporal correlation due to the sequential nature of time. The prior distribution of the constant and jump and return event-defined temporal frailties was as follows:
with
. An intuitive prior distribution for the temporal trend frailty was also a random walk to impose correlation in time. The change points for all event-related frailties were based on the timing of the event, e.g., the month of September for HK, and this minimized identifiability issues between the non-event- and event-related frailties. There are many potential combinations and specifications related to this temporal frailty, and we addressed those believed to be most appropriate for these data. However, other specifications could be important in other circumstances, and this flexible modeling framework can handle other specifications. Details about the specifications used in the case study are defined in Section 4.2.
2.3. Spatio-temporal frailty models
A second technique for incorporating temporal structure via frailties into the linear predictor involved including a spatio-temporal interaction parameter (Knorr-Held, 2000; Wikle and others, 1999; Banerjee and Carlin, 2003; Banerjee and others, 2003); even more complex structures could be assumed here, e.g., also including additive spatial and/or temporal terms. In this model, survival experience differed per space and time. The subsequent linear predictor (
) was defined as follows:
![]() |
We considered two definitions of
. The first,
, assumed frailties that were correlated in time, uncorrelated in space defined as
. We let the precision parameter vary by
as this allowed the spatio-temporal interaction to accommodate separate precision parameters per study year to potentially represent the impact of an event during year
. The other,
, employed a multivariate conditional autoregressive (MCAR) model which furnished a frailty structure that was correlated in space and time (Banerjee and Carlin, 2003) defined as
where
were as in Section 2.1,
was a
matrix that contained adjacency information, and
was a
matrix that represented the conditional variance matrix. This definition was structured as in Banerjee and others (2003) where they go on to abbreviate the distribution by dictating that
, and this is the definition we will use to refer to this distribution from here on. The precision matrix
followed a Wishart distribution with degrees of freedom
and parameter matrix
where
was the identity matrix with dimension
. To complete the appropriate specification for the MCAR model, an intercept was included for each
such that
where
follows the “dflat” prior distribution, a very wide, flat prior distribution. This spatial correlation could be important when considering the impact of events on survival as certain events, e.g., natural disasters, could impact spatial regions differently.
2.4. Notation and summary of fitted models
The fitted models described in Sections 2.1.–2.3. are summarized in Table 1. These fitted models covered a wide range of options available for spatial and spatio-temporal specification within the AFT modeling framework and were defined in such a way that all simpler models are special cases of the more complex one(s); thus,
is a special case of
which is a special case of
, and so on. Note, we only considered the uncorrelated spatial frailty in this simulation since (1) the focus of this project was on the temporal frailties and (2) this was the best spatial frailty structure for previous spatial only explorations involving the case study data (Carroll and others, 2017). All the methods could accommodate a correlated spatial frailty parameter.
Table 1.
Summary of fitted models
| Fixed | Spatial | Temporal | Spatio-temporal | ||||
|---|---|---|---|---|---|---|---|
| Fitted Models |
|
|
|
|
|
|
|
|
✓ | ✓ | |||||
|
✓ | ✓ | ✓ | ||||
|
✓ | ✓ | ✓ | ✓ | |||
|
✓ | ✓ | ✓ | ||||
|
✓ | ✓ | ✓ | ||||
The ✓ represents what is included in a given model. For example,
has four checks representing the model contents such that
.
2.5. Model comparison and evaluation tools
The model goodness-of-fit was evaluated using the standard measure of deviance information criterion (DIC) (Spiegelhalter and others, 2002). However, when some of the simulation scenarios were fitted with the MCAR model, a negative effective number of parameters estimate (pD) was produced; this estimate was needed for the DIC calculation but nonsensical when the value was below zero. Typically, negative estimates arise because of a strong prior-data conflict, and this type of result is understandable when an overly complex model is fitted. We adopted the slightly more conservative Gelman and others (2004) alternative calculation using
where
was the posterior deviance from the OpenBUGS sampler as this prevented a nonsensical negative pD estimate. Hence, the DIC calculation was such that
for all scenarios.
The models recovery abilities were evaluated via bias squared calculations in the simulation study. These bias squared calculations utilized used the posterior mean to get an estimate of
(notated as
), calculated the bias squared per simulated data set (
), and then averaged the bias squared calculation over all simulated data sets. This type of calculation was also performed only relating to the frailty estimates such that, for example with
,
and
were used in place of
and
, respectively.
Maps, plots, and secondary assessments were also useful tools of evaluation. In the simulation study, we calculated and plotted measures of bias squared to assess simulation ground truth recovery. Finally, we produced maps and plots of the spatial and temporal frailty estimates for (1) simulation ground truth comparison and (2) assessment in the real data case study. The case study evaluation was taken a step further by quantitatively assessing the spatial and temporal frailties via comparisons with risk factors available at the same spatial and temporal resolution (Carroll and Zhao, 2018).
2.6. Computational techniques
These analyses were accomplished using RStudio version 0.99.902 and R version 3.3.1. Specifically, the package R2OpenBUGS which calls the Bayesian inference software OpenBUGS from R was utilized for inference (Carroll and others, 2015; Lunn and others, 2013; Team, 2015; Thomas and others, 2014; 2006). To execute the AFT model in OpenBUGS, the zeros trick as described in the BUGS manual was employed to obtain the correct likelihood contribution since this likelihood was not among the standard distributions. Finally, the R package fillmap, which is available via GitHub, was utilized for producing maps and performing the quantitative secondary assessments (Carroll and Zhao, 2018; Carroll, 2016).
In the OpenBUGS sampler, the following updaters were utilized for the given parameters: adaptive metropolis mixed block for
,
,
,
, and
; standard adaptive metropolis block for
and
from
; wrapper for chain graph for
and
from
; conjugate Wishart for
; and slice for the standard deviations of the random effects. Example R and BUGS code is available in the supplementary material available at Biostatistics online. Additionally, an R shiny application for the case study is available on GitHub from user: carrollrm in repository: LAmortBCaShiny.
3. Simulation study
3.1. Assumptions
For these simulation studies, we assumed an area that contained 25 spatial regions on a
grid. We further assumed that all individual diagnoses (
) occurred in a span of 6 years with 200 per year and at least one per spatial region for each year. A month of diagnosis variable was also randomly generated.
For the fixed effects, there was one standardized continuous (
,
), one dichotomous (
,
), one categorical (
,
,
) covariate, and the intercept
was set to be 2.5. These were included in all simulation scenarios. For the MCAR model,
. For the random effects, spatial, temporal, and spatio-temporal structures were assumed where the indexing in space was
for the spatial regions within the grid and time was related to diagnosis year,
, and the event which occurred at month six of year three,
. Specifically for random effects, we assumed an uncorrelated spatial variation (
), two possibilities for a temporal variation: (1) an annual temporal random walk (
with
for
) and (2) an annual temporal random walk plus an event-related temporal parameter that followed the “constant” specification via a single change point (
where
with
for
,
, and
), and two alternatives for spatio-temporal variation: (1)
and (2)
where
. A single realization of these random effects was assumed for all simulated data sets in combinations to reflect the five fitted models described in Sections 2.1–2.4.
To produce the 50 simulated data sets under each of these scenarios, a simulated
was calculated based on the specifications above. With that value, a survival time (in months) was calculated as in equation 2.1 with
and an assumed error,
, that was simulated separately for each of the 50 simulated data sets to offer variation within the simulation scenario. A censoring time was also generated from an exponential distribution such that roughly 90% of individuals were censored. This amount of censoring closely resembles the case study data as well as breast cancer statistics on survival (American Cancer Society, 2016).
3.2. Simulation study results
Table 2 includes the model goodness-of-fit and recovery evaluation measures from the simulation study data scenarios that reflect all simulated data sets and fitted models. Over all scenarios, the true models performed very well, all with low DIC and bias. In terms of identifying the correct model, DIC was generally sufficient, but when two DICs were close, bias squared was useful. Further, the goodness-of-fit and recovery estimates largely agreed and were less definitive for the non-additive scenarios. Ultimately, these results suggested that failing to include a temporal frailty when needed (i.e., fitting
to any of the simulated data sets not simulated as in
) led to poor model fits and biased estimates; additionally, including a temporal frailty with simple structure when it was unneeded (i.e., data simulated as in
fit with
) did not sacrifice model goodness-of-fit or recovery. Both measures did well to identify if the frailty structure should be additive (
,
) or an interaction (
,
). Figure 2.1 of the supplementary material available at Biostatistics online displays plots of bias squared relating to the frailty portion of the simulated data scenario models. The bias squared estimates indicated how well the models recovered the assumed frailties for each simulation scenario. These results differed from those in Table 2 in that they only related to the frailty estimates. However, they agreed with DIC and some overall bias squared estimates in that the bias squared appeared to appropriately indicate if the frailty should be additive spatial and temporal terms or an interaction. Further, these results also suggested that including temporal frailties of any type when unnecessary does not largely increase the bias squared, aside from
. Additionally, regardless of the frailty specification, the fixed effect estimates were recovered fairly well for all models and scenarios (Table 2.1 of the supplementary material available at Biostatistics online); however, the further the fitted model was from the true model in terms of specification, the more biased the estimates became.
Table 2.
Model goodness-of-fit for each simulation scenario averaged over the 50 simulated data sets
| Model | Parameterization | DIC | pD |
|
|---|---|---|---|---|
Data simulated as in
| ||||
|
|
4263.15 | 31.11 | 7.79 |
|
|
4266.42 | 33.59 | 7.80 |
|
|
4269.62 | 34.78 | 11.54 |
|
|
4290.03 | 70.19 | 7.98 |
|
|
4361.11 | 161.04 | 935.08 |
Data simulated as in
| ||||
|
|
5145.22 | 31.11 | 20.06 |
|
|
4401.93 | 36.24 | 7.79 |
|
|
4419.90 | 38.73 | 7.75 |
|
|
4567.28 | 189.93 | 8.19 |
|
|
4509.07 | 158.68 | 935.67 |
Data simulated as in
| ||||
|
|
7248.81 | 29.12 | 988.68 |
|
|
5043.69 | 34.62 | 204.13 |
|
|
2889.37 | 37.99 | 8.52 |
|
|
4814.55 | 117.13 | 191.50 |
|
|
4827.53 | 134.36 | 1143.08 |
Data simulated as in
| ||||
|
|
5066.11 | 30.66 | 19.53 |
|
|
5048.95 | 35.44 | 18.94 |
|
|
5050.77 | 37.05 | 21.88 |
|
|
4412.46 | 17.84 | 8.01 |
|
|
4409.86 | 180.26 | 935.08 |
Data simulated as in
| ||||
|
|
7147.41 | 31.73 | 151.42 |
|
|
6287.31 | 35.51 | 73.61 |
|
|
6287.27 | 37.32 | 72.94 |
|
|
4482.51 | 191.04 | 10.47 |
|
|
4479.24 | 204.54 | 9.76 |
Bold highlighted estimates indicate models comparable to the best in terms of fit and recovery. Close for DIC is 3–4 units or less and close for bias squared is a 5% difference.
Figure 1 displays the temporal frailty estimates related to the data simulated as in
and fitted with models
and
. All other true temporal and spatio-temporal frailties in the simulation and the corresponding estimates are included in Figures 2.2–2.14 of the supplementary material available at Biostatistics online. All these displays suggested the true simulation model frailties were recovered well under appropriate models. Specifically, both
and
recovered the truth for the data simulated as in
. For the data simulated as in
, the effect of the event was separable and represented accurately with
by including
. Fitted model
attempted to recover the truth by giving an averaged estimate for the year in which the event occurred and, thus, was not able to recover the impact of the event as well as the true model,
. We also explored events that occurred at month one of year four, i.e.,
occurred at the same time point as
, and of different magnitudes. From these, it was apparent that
could accurately recover the impact of these events; however,
had an advantage through the ability to give an accurate estimate directly related to the event of interest, when the impact was strong enough. Similarly, the spatial frailty estimates from
,
, and
reflected the 6 year average of the simulation truth for
and
(Figures 2.4 and 2.5 of the supplementary material available at Biostatistics online).
Fig. 1.
Temporal frailty simulation assumption (
) and estimates (
and
).
4. Louisiana SEER BrCa-specific mortality case study
The case study illustrated these fitted models abilities under a real situation rather than the ideal one laid out in the simulation study. Explicitly, this case study explored BrCa-specific mortality in Louisiana, USA for years 2000 to 2013. These data were previously explored in the spatial only setting (Carroll and others, 2017). Those results suggested that spatial frailties were important and explained survival differences that were independent of the individual-level risk factors adjusted for, but we believed this could be further improved by including temporal frailty terms.
4.1. Case study data
The data for this case study was obtained from the publicly available SEER data sets (release date: 15 April 2016), which offered a cause of death variable to indicate if the individualâŁTMs underlying cause of death was BrCa as well as survival time in months. In addition to this information, diagnosis month and year were recorded along with the FIPS county code for each woman. These together created the spatial and temporal survival scenario that we were interested in exploring.
The SEER data also provided several clinical and demographic risk factors of known association with BrCa mortality (American Cancer Society, 2016; Surveillance, Epidemiology, and End Results Program, 2015; Wieder and others, 2016). By including these available risk factors, the spatial, temporal, and spatio-temporal frailties represented latent combinations of unmeasured risk factors beyond these. Explicitly, the known risk factors of interest included: African American race, marital status at diagnosis, age at diagnosis (standardized to have a mean of 0 and standard deviation of 1), cancer grade, ER/PR tumor subtype status, BrCa surgery, and radiation therapy.
This real data case study furnished a situation for examining the use of temporal and spatio-temporal frailties in relation to risk factors and external events that could alter the survival time. HK (29 August 2005) made landfall and moved through the state of Louisiana during the time of this case study. Additionally, the ACA, a nationwide legislation which offered affordable health care to all citizens, was signed into law by President Barack Obama on 23 March 2010. Finally, knowledge, education, and screenings related to BrCa improved over these years which could influence breast cancer mortality, thus we believed that a temporal frailty could aid in adjusting for that as well.
4.2. Case study-specific statistical model details
For this case study, we considered a single definition of
, diagnosis years, and several definitions of
. From the definitions given in Section 2.2, we considered non-lagged and lagged options for the constant (single change point) and jump and return (double change point) specifications where a lagged effect indicated that, for example, HK’s impact on BrCa-specific mortality was not detected for those diagnosed immediately after the storm, rather it began a year later. The trend specification was also considered. Here, we had estimates for five years following HK but only four years following ACA due to the end of the study. Then, from the best single change point constant options (non-lagged vs. lagged) per event, we considered the BOTH specification which combined the change points from the best models for the two individual events. The specifications included for the event-related frailty in the alternative
specifications allowed the HK-related frailty to change at the month of September while the ACA-related frailty changed at the month of April for specified years. The alternative options led to slightly different interpretations of the temporal frailty.
4.3. Case study results
The results in Table 3 were used for assessing the models’ goodness of fit. When we considered the events one at a time, the best models were ACA - constant and HK - constant lag. The DIC measures suggested that all models that accounted for ACA without a lag time were comparable and the best fitting. The fit with BOTH - constant did not appear to improve upon ACA - constant enough to warrant including an HK-related temporal frailty; thus,
ACA - constant was the best model for these data.
Table 3.
Model assessment estimates and calculations
| Model | Parameterization | DIC | pD |
|---|---|---|---|
|
|
54997.5 | 39.2 |
|
|
54930.0 | 48.6 |
|
|
||
| HK-constant | 54930.0 | 48.0 | |
| HK-constant lag | 54928.2 | 48.4 | |
| HK-jump and return | 54930.0 | 49.1 | |
| HK-trend | 54930.0 | 49.0 | |
| ACA-constant | 54926.1 | 48.2 | |
| ACA-constant lag | 54930.0 | 49.5 | |
| ACA-trend | 54930.0 | 48.7 | |
| BOTH-constant | 54926.1 | 48.7 | |
| ST1 |
|
54969.4 | 95.4 |
| ST2 |
|
54986.8 | 128.3 |
For the
models, HK and ACA indicate Hurricane Katrina (August 2005) and Affordable Care Act (March 2010) respectively, constant provides a single change point for the frailty and continues to keep the estimate for the rest of the study years, jump and return indicates an estimate that jumps to an estimate for a window of time (specifically September 2007 to August 2009) then returns to the previous parameter value (2 change points,
), constant lag means that a year of lag time was allowed following the event date (1 change point,
), trend indicates that the temporal frailty term offered a different estimate per year for several years following the event (HK-4 change points,
and ACA-3 change points,
), and BOTH includes changes for both events where the change point for HK is lagged (2 change points,
). Highlighted estimates indicate models comparable to the best in terms of fit. Close for DIC is 3–4 units or less.
Figure 2 displays the temporal frailty estimates for
ACA - constant with the Louisiana SEER data. Estimates for other models are included in Figures 3.1 to 3.8 of the supplementary material available at Biostatistics online. The fixed effect estimates and spatial frailties were nearly identical for all fitted models and estimates associated with
ACA - constant are included in Table 3.1 and Figure 3.9 of the supplementary material available at Biostatistics online. In general, the overall (
for
and
for
) temporal frailty suggested an increase in survival time across the study, a period of more constant, slightly reduced estimates began at about 2007, and a large increase in 2010. We believed the overall increase represented advancements in BrCa education and screening, the midway leveling off could have been due to a small but inseparable impact from HK, and the large increase related to the passing of the ACA. The
estimate from the best model suggested that the change in 2010 was important to represent as a separate event-related temporal frailty. Figure 3.4 of the supplementary material available at Biostatistics online displays the temporal frailty estimate for ACA - trend, and these estimates suggested that we need not fit ACA - constant lag nor ACA - jump and return models since the estimates appeared to change immediately and did not diminish prior to the end of the study time. This was supported by the DIC estimate for ACA - constant lag showing no improvement over that of
. The spatio-temporal interaction frailties also appeared to have a slight decline and/or leveling off in the years 2007–2009 and an increase around 2010 (Figures 3.7 and 3.8 of the supplementary material available at Biostatistics online), but DICs suggested that these complicated models did not offer improvements in model fit.
Fig. 2.
Temporal frailty estimates for
ACA—constant with the case study data.
5. Discussion
Our results suggested that it was important to consider temporal frailties in addition to spatial frailties for survival analysis. Both types of frailties allowed for modeling of the unmeasured confounding in the data beyond what the known fixed effect risk factors could explain. Further, the temporal frailty had the ability to represent changes in survival experience over time due to events such as a major hurricane or health-related government legislation.
The simulation results indicated that it was necessary to include a temporal frailty when temporal variation was present in the simulation ground truth. Moreover, even when there was no true temporal structure assumed in the simulation, the model did not suffer in terms of goodness-of-fit or recovery when a separate temporal frailty was included. However, the bias specifically related to the frailties was slightly increased. Beyond that, the simulation results also indicated that (1) the fixed effect estimates were recovered well even when the frailties were misspecified and (2) that the frailty estimates were recovered nearly perfectly when appropriately specified or with some misspecification.
These results also illustrated the importance of choosing the appropriate type of frailty structure: additive spatial and temporal or a spatio-temporal interaction. With the additive spatial and temporal frailty structure (
,
), examination of the frailties’ associations could be performed independently for space and time, but the event effect must be defined a priori. Adding an ability for the data to select the change-points for the event-related frailty is of interest for future explorations. For models that contained the spatio-temporal frailty structure (
and
), interactions between space and time were allowed. This could potentially be a more flexible modeling approach and ideal in certain situations but interpretation and secondary assessments were more difficult. Both the simulation study and the case study illustrated that DIC was useful in distinguishing between the best of these two options for the data at hand.
Identifiability was a concern with these models as multiple temporal random effects were included in a single linear predictor. This issue has been discussed with other models of similar structure, e.g., wherein multiple spatial random effects were included (Waller and Carlin, 2010). However, our results suggested that we did not have the same issues with identifiabilty since the simulation study appropriately recovered parameters, when the event effect was strong enough. Beyond that, the event effect differed from the annual effect in that the change point(s) fell at different points in time; however, this might not always be the case. Finally, the model goodness-of-fit measures indicated if it was important to include the separate event frailty. Thus, we believed that if the event was not strong enough to be separable and identifiable from the annual frailty, the goodness-of-fit measures would indicate that the event frailty was not necessary. The case study illustrated this as there was likely still an impact from Hurricane Katrina, but that impact was captured in the annual frailty rather than by a separate event-related frailty. Ultimately, we felt that it was best to start with the simpler models (
and
) then try to improve upon them with additional parameters.
Using these spatial and temporal frailties led to an improved understanding of our case study outcome of interest. By incorporating frailties into the model, the produced latent effects were examined and the unmeasured confounding in the data was assessed. We demonstrated how to make these interpretations with respect to BrCa-specific mortality in Louisiana SEER data for the years 2000–2013 as well as different ways of defining the temporal frailty parameter. First, we illustrated how to determine the best construction of the temporal frailty parameter, e.g., constant, lagged, jump and return, or trend, by initially fitting the trend parameterization. To assess the spatial and temporal frailties, we compared them to several available risk factors at the same spatial and temporal resolution. Descriptions of the risk factors explored, tables of estimates, plots of the comparisons, and correlations between the frailties and risk factors are included in the supplemental materials and displayed in Table 4.1 and Figure 4.1supplementary material available at Biostatistics online. Based on these results, we believed that the overall temporal frailty was associated with increased number and quality of BrCa screenings (number of mammograms:
), access to health care (total number of hospitals,
), socio-economic status (% persons in poverty,
), and the impacts of HK as well as the passing of the ACA for those diagnosed following the given event. The spatial frailty estimates were much alike the estimates seen in our previous work (Carroll and others, 2017), and there we determined that they were associated with socio-demographic status, access to and quality of health care, access to fresh food, and chemical exposure related to working in agriculture. Table 4.1supplementary material available at Biostatistics online displays secondary model fit results, which indicated that these risk factors continued to be the ones that were associated with this spatial frailty, and Section 4.2. of the supplementary material available at Biostatistics online describes the spatial and temporal secondary frailty assessments.
6. Conclusion
The incorporation of temporal frailties in addition to spatial frailties in survival analysis led to better fitting models and improved inference. Our methods addressed a wide range of spatial and temporal options for structuring frailties and examined the benefits of using these different structures in certain settings. Ultimately, we believed that the temporal frailties could play an important role in representing the unmeasured risk factors related to improvements in disease knowledge and screenings as well as events that have the potential to alter survival.
Supplementary Material
Acknowledgments
This research was supported by the Intramural Research Program of NIH, National Institute of Environmental Health Sciences. Conflict of Interest: None declared.
References
- American Cancer Society. (2016). Breast cancer facts and figures 2015-2016. https://goo.gl/CtmxyY (accessed May 2017).
- Banerjee, S. and Carlin, B. P. (2003). Semiparametric spatio-temporal frailty modeling. Environmetrics 14, 523–535. [Google Scholar]
- Banerjee, S., Carlin, B. P. and Gelfand, A. E. (2003a). Hierarchical multivariate CAR models for spatio-temporally correlated survival data (with discussion). In: Bernardo, J. M.,Bayarri, M. J.,Berger, J. O.,Dawid, A. P.,Heckerman, D.,Smith,, A. F. M. and West, M. (editors), Bayesian Statistics. Oxford: Oxford University Press, pp. 45–63. [Google Scholar]
- Banerjee, S., Wall, M. M. and Carlin, B. P. (2003b). Frailty modeling for spatially correlated survival data, with application to infant mortality in minnesota. Biostatistics 4, 123–142. [DOI] [PubMed] [Google Scholar]
- Bastos, L. S. and Gamerman, D. (2006). Dynamic survival models with spatial frailty. Biostatistics 12, 441–460. [DOI] [PubMed] [Google Scholar]
- Batista, N. E. and Antn, O. A. (2013). Spatiotemporal analysis of lung cancer incidence and case fatality in Villa Clara Province, Cuba. MEDICC Review 15, 16–21. [DOI] [PubMed] [Google Scholar]
- Besag, J. and Green, P. J. (1993). Spatial statistics and Bayesian computation. Journal of the Royal Statistical Society. Series B (Methodological) 55, 25–37. [Google Scholar]
- Carroll, R. (2016). fillmap: Create maps with spatialpolygons objects. R package version 0.0.0.9000. [Google Scholar]
- Carroll, R., Lawson, A. B.,Faes, C.,Kirby, R. S.,Aregay, M. and Watjou, K. (2015). Comparing INLA and OpenBUGS for hierarchical Poisson modeling in disease mapping. Spatial and Spatio-temporal Epidemiology 14–15, 45–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carroll, R., Lawson, A. B.,Faes, C.,Kirby, R. S.,Aregay, M. and Watjou, K. (2016). Spatio-temporal Bayesian model selection for disease mapping. Environmetrics 27, 466–478. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carroll, R., Lawson, A. B.,Jackson, C. L. and Zhao, S. (2017). Spatial assessment of breast cancer-specific mortality using Louisiana SEER data. Social Science & Medicine (1982) 11, 1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carroll, R. and Zhao, S. (2018). Gaining relevance from the random: Interpreting observed spatial heterogeneity. Spatial and Spatio-temporal Epidemiology, 25, 11–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Christensen, R. and Johnson, W. (1988). Modelling accelerated failure time with a Dirichlet process. Biometrika 75, 693–704. [Google Scholar]
- Collett, D. (2013). Modelling survival data in medical research. In: Faraway, J. J.,Tanner, M. A.,Carlin, B. P.,Zidek, J. and Blitzstein, J. K. (editors), Texts in Statistical Science. Boca Raton: CRC Press, pp. 221–274. [Google Scholar]
- Henderson, R., Shimakura, S. and Gorst, D. (2002). Modeling spatial variation in leukemia survival data. Journal of the American Statistical Association 97, 965–972. [Google Scholar]
- Knorr-Held, L. (2000). Bayesian modelling of inseparable space-time variation in disease risk. Statistics in Medicine 19, 2555–2567. [DOI] [PubMed] [Google Scholar]
- Lawson, A. B., Carroll, R.,Faes, C.,Kirby, R. S.,Aregay, M. and Watjou, K. (2017). Spatio-temporal multivariate mixture models for Bayesian model selection in disease mapping. Environmetrics 28, e2465. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li, G., Best, N.,Hansell, A. L.,Ahmed, I. and Richardson, S. (2012). Baystdetect: Detecting unusual temporal patterns in small area data via Bayesian model choice. Biostatistics 13, 695–710. [DOI] [PubMed] [Google Scholar]
- Li, Y. and Ryan, L. (2002). Modeling spatial survival data using semiparametric frailty models. Biometrics 58, 287–297. [DOI] [PubMed] [Google Scholar]
- Lunn, D., Jackson, C.,Best, N.,Thomas, A. and Spiegelhalter, D. (2013). The BUGS Book: A Practical Introduction to Bayesian Analysis, 1st edition. Boca Raton: CRC Press. [Google Scholar]
- Onicescu, G., Lawson, A.,Zhang, J.,Gebregziabher, M.,Wallace, K. and Eberth, J. M. (2017). Bayesian accelerated failure time model for space-time dependency in a geographically augmented survival model. Statistical Methods in Medical Research 26, 2244–2256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Orbe, J., Ferreira, E. and Nunez-Anton,, V. (2002). Comparing proportional hazards and accelerated failure time models for survival analysis. Statistics in Medicine 21, 3493–3510. [DOI] [PubMed] [Google Scholar]
- Silva, G. L. and Amaral-Turkman,, M. A. (2005). Bayesian analysis of an additive survival model with frailty. Communication in Statistics A 33, 2517–2533. [Google Scholar]
- Spiegelhalter, D. J., Best, N. G.,Carlin, B. P. and van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society Series B 64, 583–639. [Google Scholar]
- Surveillance Epidemiology, and End Results Program. (2015). SEER Stat Fact Sheets: Female Breast Cancer. National Cancer Institute. [Google Scholar]
- Team, R Core. (2015). R: A language and environment for statistical computing. R foundation for statistical computing. Vienna, Austria. [Google Scholar]
- Thomas, A., Best, N.,Lunn, D.,Arnold, R. and Spiegelhalter, D. (2014). GeoBUGS User Manual. Cambridge, UK: MRC Biostatistics Unit. [Google Scholar]
- Thomas, A., O’hara, B.,Ligges, U. and Sturtz, S. (2006). Making BUGS open. R News 6, 12–17. [Google Scholar]
- Waller, L. A. and Carlin, B. P. (2010). Disease mapping. In: Gelfand, A. E.,Diggle, P. J.,Fuentes, M. and Guttorp, P. (editors), Handbook of Spatial Statistics. Boca Raton: CRC Press, pp. 217–244. [Google Scholar]
- Waller, L. A., Carlin, B. P.,Xia, H. and Gelfand, A. E. (1997). Hierarchical spatio-temporal mapping of disease rates. Journal of the American Statistical Association 92, 607–617. [Google Scholar]
- Wieder, R., Shafiq, B. and Adam, N. (2016). African American race is an independent risk factor in survival from initially diagnosed localized breast cancer. Journal of Cancer 7, 1587–1598. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wikle, C., Berliner, M. and Cressie, N. (1999). Hierarchical Bayesian space-time models. Environmental and Ecological Statistics 5, 117–154. [Google Scholar]
- Zhang, J. and Lawson, A. B. (2011). Bayesian parametric accelerated failure time spatial model and its application to prostate cancer. Journal of Applied Statistics 38, 591–603. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



















































































