Bayesian negative binomial regression model with unobserved covariates for predicting the frequency of north atlantic tropical storms

Xun Li; Joyee Ghosh; Gabriele Villarini

doi:10.1080/02664763.2022.2063266

. 2022 May 1;50(9):2014–2035. doi: 10.1080/02664763.2022.2063266

Bayesian negative binomial regression model with unobserved covariates for predicting the frequency of north atlantic tropical storms

Xun Li ^a, Joyee Ghosh ^b,^CONTACT, Gabriele Villarini ^c

PMCID: PMC10291923 PMID: 37378269

Abstract

Predicting the annual frequency of tropical storms is of interest because it can provide basic information towards improved preparation against these storms. Sea surface temperatures (SSTs) averaged over the hurricane season can predict annual tropical cyclone activity well. But predictions need to be made before the hurricane season when the predictors are not yet observed. Several climate models issue forecasts of the SSTs, which can be used instead. Such models use the forecasts of SSTs as surrogates for the true SSTs. We develop a Bayesian negative binomial regression model, which makes a distinction between the true SSTs and their forecasts, both of which are included in the model. For prediction, the true SSTs may be regarded as unobserved predictors and sampled from their posterior predictive distribution. We also have a small fraction of missing data for the SST forecasts from the climate models. Thus, we propose a model that can simultaneously handle missing predictors and variable selection uncertainty. If the main goal is prediction, an interesting question is should we include predictors in the model that are missing at the time of prediction? We attempt to answer this question and demonstrate that our model can provide gains in prediction.

Keywords: Bayesian model averaging, Bayesian variable selection, count data, missing covariates, Markov chain Monte Carlo, prediction sets

1. Introduction

Prediction of tropical cyclone (TC) activity for the North Atlantic region started in the early 1980s [5,6], while the very first attempt for prediction of TC activity around the world was taken by Neville Nicholls in the late 1970s [13]. Since then, the prediction of North Atlantic tropical storms has received more and more attention and previous studies have built forecast systems which give retrospective forecasts for the hurricane season that reaches its peak during August to October [2,6,24]. Although it is difficult to give accurate forecasts of TC activity 9–10 months prior to a particular season [18], considerable progress has been made for shorter lead times [2,17,19,22].

In this paper, we focus on developing models for the annual frequency of tropical storms, which is one of the measures of the severity of a hurricane season. Knowing in advance, whether it will be an active season or not can help in improved preparedness. The existing literature [20,23] has shown that SSTs averaged over the peak hurricane season are good predictors of tropical storm activity (aggregated over the hurricane season for a given year). For example, warmer temperature in the tropical Atlantic Ocean during August–October is expected to be favorable for the formation of tropical storms. However, observed SSTs are available after the hurricane season and, thus, cannot be directly used for prediction. Instead, forecasts of SSTs are available from multiple climate models, also known as general circulation models (GCMs). We focus on forecasts of Atlantic sea surface temperatures ( $S S T_{A t l}$ ) and tropical mean sea surface temperatures ( $S S T_{T r o p}$ ) by five GCMs (GFDLB01, GFDLA06, GFDL, NASA, CMC2) from the North American Multi-Model Ensemble Project (NMME; Kirtman et al. [7]). The NMME [7] represents a multi-agency supported effort for intraseasonal to interannual prediction experiment. A number of research groups in North America have been providing outputs from their hindcasts and real-time forecasts since 2011. The GCMs we use provide a set of monthly forecasts from 1982 to the present. Predictions are available with a lead time from 9 to 12 months; multiple members are available for each GCM, and here we consider their ensemble average as representative of a given model.

The response variable is the total number of tropical storms that occur during August to October of each year, and the predictors are SSTs (true or forecasts) averaged over the same period (August to October of that year). To be clear, we have data aggregated for each year and not for each month. The predictors (SSTs) are time varying and capture the dependency across years; thus, time series models are typically not used in the climate science literature for this setting. Based on exploratory data analysis, we found the residuals satisfy the independence assumption reasonably well, so we do not consider time series models in this work. Some plots for model diagnostics are included in the Supplemental Material. Each of the five climate models issues a new forecast of SSTs every month, so our predictors change every month, and in this paper, we focus on monthly forecasts issued in June, July, and August. The SST forecasts change every month; however, they are all forecasts of the same quantity: the average true SST during August to October of a given year.

Because the structure of the data is somewhat complicated, we provide a schematic representation in Table 1. In Table 1, Year, TS, and SST denote the calendar year, the count of tropical storms in the year during August to October, and the average true SST during August to October of the year, respectively. The averages are obtained from monthly SST data. The SST forecasts from five climate models are denoted by SST $_{F 1}$ , $\dots$ , SST $_{F 4}$ , SST $_{F 5}$ , and in this work, we focus on the forecasts issued in June, July, and August. During the period 1958–1981, only TS and SST (true) are available and denoted with a checkmark (✓). The climate models did not issue forecasts during that period and are unavailable and denoted with a cross-mark (✗). During the period 1982–2018, TS and SST (true) are available as before, and most of the SST forecasts are available. However, a few SST forecasts are missing in that period because some climate models did not issue forecasts in all years, which are denoted by cross-marks. In reality, there are two kinds of SSTs that are used as predictors (Atl and Trop), but we did not show that information in Table 1 for simplicity.

Table 1.

Schematic presentation of the data.

	True		Forecasts Issued in June				Forecasts Issued in July				Forecasts Issued in August
Year	TS	SST	SST $_{F 1}$	···	SST $_{F 4}$	SST $_{F 5}$	SST $_{F 1}$	···	SST $_{F 4}$	SST $_{F 5}$	SST $_{F 1}$	···	SST $_{F 4}$	SST $_{F 5}$
1958	✓	✓	✗	···	✗	✗	✗	···	✗	✗	✗	···	✗	✗
1959	✓	✓	✗	···	✗	✗	✗	···	✗	✗	✗	···	✗	✗
···	···	···	···	···	···	···	···	···	···	···	···	···	···	···
···	···	···	···	···	···	···	···	···	···	···	···	···	···	···
1981	✓	✓	✗	···	✗	✗	✗	···	✗	✗	✗	···	✗	✗
1982	✓	✓	✓	···	✓	✓	✓	···	✓	✓	✓	···	✓	✓
1983	✓	✓	✓	···	✓	✓	✓	···	✓	✓	✓	···	✓	✓
···	···	···	···	···	···	···	···	···	···	···	···	···	···	···
···	···	···	···	···	···	···	···	···	···	···	···	···	···	···
2011	✓	✓	✓	···	✓	✗	✓	···	✓	✗	✓	···	✓	✗
···	···	···	···	···	···	···	···	···	···	···	···	···	···	···
2014	✓	✓	✓	···	✗	✓	✓	···	✓	✓	✓	···	✓	✓
···	···	···	···	···	···	···	···	···	···	···	···	···	···	···
2018	✓	✓	✓	···	✓	✓	✓	···	✓	✓	✓	···	✓	✓

					Coverage		Size
Method	Cor.Pearson	Cor.Spearman	RMSE	MAE	Equal-tailed	HPD	Equal-tailed	HPD
Hierarchical	0.52	0.49	3.76	2.81	0.95	0.94	17.33	14.64
No missing	0.74	0.70	3.00	2.27	0.93	0.91	10.27	9.65
Method 1	0.69	0.66	3.24	2.44	0.94	0.92	11.18	10.42
Method 2	0.70	0.67	3.18	2.40	0.91	0.90	10.22	9.48

		Method 1 (Retains covariates 1 and 2)	Method 2 (Discards covariates 1 and 2)
Predictors	True value	Mean	Mean
Intercept	1.80	1.80	1.81
1	0.40	0.32	–
2	−0.20	−0.14	–
3	0.35	0.32	0.57
4	−0.25	−0.20	−0.25
5	0.00	0.00	0.00
6	0.00	0.00	0.00
7	0.00	0.00	−0.01
8	0.00	0.00	0.00
9	0.00	0.00	0.00
10	0.00	0.00	0.00
11	0.00	0.00	0.00
12	0.00	0.00	−0.04

Predictors	Method 1	Method 2	No Missing
1	0.89		0.89
2	0.48		0.48
3	0.27	0.97	0.27
4	0.14	0.15	0.14
5	0.09	0.10	0.09
6	0.10	0.13	0.10
7	0.11	0.11	0.11
8	0.11	0.13	0.11
9	0.10	0.11	0.10
10	0.12	0.13	0.12
11	0.11	0.12	0.11
12	0.12	0.14	0.12

Predictor	June	July	August
$G F D L A 06_{A t l}$	0.64	0.85	0.28
$G F D L A 06_{T r o p}$	0.16	0.20	0.20
$G F D L_{A t l}$	0.12	0.10	0.12
$G F D L_{T r o p}$	0.14	0.16	0.26
$G F D L B 01_{A t l}$	0.34	0.35	0.33
$G F D L B 01_{T r o p}$	0.17	0.19	0.21
$N A S A_{A t l}$	0.27	0.16	0.16
$N A S A_{T r o p}$	0.27	0.17	0.20
$C M C 2_{A t l}$	0.09	0.09	0.11
$C M C 2_{T r o p}$	0.19	0.13	0.19
$O B S_{A t l}$	0.24	0.16	0.16
$O B S_{T r o p}$	0.50	0.69	0.66
$G F D L A 06_{A t l_{J u l y}}$	–	–	0.79

					Coverage		Size
Method	Cor.Pearson	Cor.Spearman	RMSE	MAE	Equal-tailed	HPD	Equal-tailed	HPD
Hierarchical	0.65	0.68	2.92	2.50	1.00	1.00	12.00	11.38
Method 1	0.87	0.93	1.77	1.38	1.00	1.00	15.25	13.88
Method 1 (added)	0.85	0.97	1.50	1.25	1.00	1.00	15.00	13.88
Method 2	0.63	0.69	2.29	1.75	1.00	1.00	15.13	13.75
Method 2 (added)	0.78	0.73	1.80	1.25	1.00	1.00	14.75	13.50

					Coverage		Size
Method	Cor.Pearson	Cor.Spearman	RMSE	MAE	Equal-tailed	HPD	Equal-tailed	HPD
Hierarchical	0.69	0.72	2.98	2.38	1.00	1.00	12.50	12.13
Method 1	0.76	0.82	1.87	1.25	1.00	1.00	14.13	13.38
Method 2	0.71	0.70	2.00	1.50	1.00	1.00	13.50	13.13

					Coverage		Size
Method	Cor.Pearson	Cor.Spearman	RMSE	MAE	Equal-tailed	HPD	Equal-tailed	HPD
Hierarchical	0.44	0.40	3.14	2.38	0.88	0.88	12.50	12.00
Method 1	0.45	0.34	2.78	2.00	1.00	1.00	13.75	13.13
Method 2	0.37	0.27	2.92	2.25	1.00	1.00	13.50	13.13

PERMALINK

Bayesian negative binomial regression model with unobserved covariates for predicting the frequency of north atlantic tropical storms

Xun Li

Joyee Ghosh

Gabriele Villarini

Abstract

1. Introduction

Table 1.

2. Review of Bayesian approach to variable selection

3. Bayesian negative binomial regression model with missing covariates

3.1. Negative binomial regression model

3.2. Sequence of linear regression models for covariates

3.3. Posterior computation

4. Simulation study

4.1. Review of Villarini et al. [21]

4.2. Data generation

4.3. Missing covariates for prediction

4.4. Results and analysis

Table 2.

Table 3.

Figure 1.

Figure 2.

Table 4.

Table 5.

Table 6.

Table 7.

5. Illustration of the methods with the north atlantic tropical storms data set

Table 8.

Table 11.

Table 10.

Table 9.

Figure 3.

6. Discussion and future work

Supplementary Material

Acknowledgments

Appendices.

Appendix 1. Full Conditionals

Table A1.

Table A2.

Appendix 3. The Models Used in the Tropical Storm Example

Table A3.

Table A4.

Table A5.

Funding Statement

Disclosure statement

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases