Abstract
Seasonal-to-decadal predictions are inevitably uncertain, depending on the size of the predictable signal relative to unpredictable chaos. Uncertainties can be accounted for using ensemble techniques, permitting quantitative probabilistic forecasts. In a perfect system, each ensemble member would represent a potential realization of the true evolution of the climate system, and the predictable components in models and reality would be equal. However, we show that the predictable component is sometimes lower in models than observations, especially for seasonal forecasts of the North Atlantic Oscillation and multiyear forecasts of North Atlantic temperature and pressure. In these cases the forecasts are underconfident, with each ensemble member containing too much noise. Consequently, most deterministic and probabilistic measures underestimate potential skill and idealized model experiments underestimate predictability. However, skilful and reliable predictions may be achieved using a large ensemble to reduce noise and adjusting the forecast variance through a postprocessing technique proposed here.
Keywords: seasonal prediction, decadal prediction, ensemble, predictability, reliability
Key Points
Model members can be too noisy and not potential realizations of the real world
Predictability may be underestimated by idealized experiments and skill measures
Can achieve skilful and reliable forecasts using large ensembles to reduce noise
1 Introduction
Individual weather events are generally not predictable more than a couple of weeks ahead. This is because the atmosphere is chaotic, so that infinitesimal differences in initial conditions grow over a few days into large-scale disturbances [Lorenz, 1963]. However, the atmosphere can be influenced by predictable slowly varying factors, leading to a prolonged shift in the climate. For example, in the tropics, sea surface temperature (SST) in the Pacific varies between warm and cool conditions every few years during the El Niño–Southern Oscillation (ENSO), influencing seasonal temperature and rainfall in many regions [e.g., Trenberth and Caron, 2000; Alexander et al., 2002; Smith et al., 2012]. In the extratropics, North Atlantic SST varies on multidecadal time scales, often referred to as the Atlantic Multidecadal Oscillation or Atlantic Multidecadal Variability (AMV), with associated decadal changes in climate over Europe, America, and Africa [Knight et al., 2006; Zhang and Delworth, 2006; Sutton and Hodson, 2007; Sutton and Dong, 2012], the Atlantic storm track position and/or strength [Wilson et al., 2009; Woollings et al., 2012; Frankignoul et al., 2013], and Atlantic hurricane frequency [Goldenberg et al., 2001; Smith et al., 2010; Dunstone et al., 2011]. Other drivers of atmospheric variability include external factors such as solar variability, changes in greenhouse gases, changes in aerosols, and internal variability including the Madden Julian Oscillation, sudden stratospheric warmings, and the Indian Ocean dipole [Smith et al., 2012].
Seasonal-to-decadal climate predictions aim to predict these drivers and their influence on the atmosphere using coupled general circulation models (GCMs) of the atmosphere, ocean, land, and cryosphere [e.g., Doblas-Reyes et al., 2013; Meehl et al., 2014; Smith et al., 2012; Kirtman et al., 2013]. They therefore predict changes in climate and the frequency of associated extreme events [Hamilton et al., 2012; Eade et al., 2012]. Time-averaged predictions may be decomposed into two components: (1) unpredictable noise resulting from the chaotic nature of the atmosphere and (2) a component that is potentially predictable because it is constrained by predictable factors, such as ENSO, AMV, or external forcing. Uncertainties are inevitable in seasonal-to-decadal prediction and depend on the size of the predictable signal relative to unpredictable chaos (signal-to-noise ratio), as well as imperfections in observations used for initial conditions, the fidelity of GCMs, and uncertainties in projected radiative perturbations [e.g., Hawkins and Sutton, 2011]. Ensembles are therefore created to assess uncertainties and enable quantitative probabilistic forecasts to be made [e.g., Palmer et al., 2004; Wang et al., 2009].
The likely skill of seasonal-to-decadal forecasts is assessed by analyzing tests over a historical period, referred to as hindcasts (forecasts made retrospectively but using only observations that would have been available at the time). The ensemble mean is expected to be more highly correlated with observations than are individual ensemble members because the unpredictable component of the model forecasts is reduced by averaging. However, assessment of other aspects of the ensemble including spread and reliability is also essential [Goddard et al., 2012; Corti et al., 2012; Ho et al., 2013]. Interpretation of the model skill is usually based on the implicit assumption that each model ensemble member represents a potential realization of the true evolution of the climate system that might have occurred due to the chaotic growth of infinitesimal perturbations to the initial conditions.
This approach relies on the predictable components in models and reality being equal. Recent results show that high skill only emerges for seasonal predictions of the winter North Atlantic Oscillation (NAO) [Scaife et al., 2014; Riddle et al., 2013] and multiyear predictions of Atlantic hurricane frequency [Smith et al., 2010] when taking the mean of a large ensemble. Indeed, Scaife et al. [2014] point out that correlation skill is higher than would be expected from the model signal-to-noise ratio [Kumar, 2009] implying that the predictable component in their NAO forecasts is smaller than in reality. Here we provide a more general assessment of the predictable component of seasonal and decadal predictions.
2 Methodology
We define the predictable component as the square root of the fraction of total variance that is predictable and seek to compare the predictable component in observations (PCobs) to that in model hindcasts (PCmod). The predictable component in reality is unknown. Previous studies have attempted to diagnose it from the ratio of low- to high-frequency variability [e.g., Boer, 2011] or by assessing potential predictability against a model ensemble member rather than the observations [e.g., Younas and Tang, 2013; Boer et al., 2013]. However, there is not necessarily a relationship between variability and predictability, and potential predictability is not necessarily related to actual predictability [Kumar et al., 2014]. Here we estimate PCobs directly from the fraction of the variance that can be explained by model forecasts, diagnosed from the Pearson correlation (r) between observations (“predictand”) and the ensemble mean of model hindcasts (“predictor”) as r2 reflects the proportion of the predictand accounted for by the predictor [e.g., Wilks, 2006]. This is a lower bound, since future improvements to models and forecast techniques, and larger ensembles, may yield higher correlations. The predictable component in the model may be estimated from the variance of a large (but finite) ensemble mean relative to the variance of individual ensemble members. This is an upper bound since the variance of the ensemble mean would be reduced if a larger ensemble were available. We define a “ratio of predictable components” (RPC) as
![]() |
(1) |
where σ2sig is the signal variance of the model ensemble mean in time and σ2tot is the average variance of individual members. Ideally, the RPC should be equal to one for a forecast system which perfectly reflects the actual predictability.
We assess the RPC in seasonal-to-decadal hindcasts against gridded observations of near-surface temperature (SAT), mean sea level pressure (MSLP), and precipitation (PREC) (details in Table S1 in the supporting information). We analyze seasonal hindcasts from the Met Office Global Seasonal forecasting system 5 (GloSea5) [MacLachlan et al., 2014] starting around the 1 of November in each year from 1992 to 2011 (20 start dates, 24 members) and a multimodel ensemble of decadal hindcasts from the Met Office decadal prediction system (DePreSys) [Knight et al., 2014; Smith et al., 2010, 2013], starting from 1 of November in each year 1960 to 2005, and four other models from the Coupled Model Intercomparison Project Phase 5 (CMIP5) [Taylor et al., 2012] that had annual start dates available for a similar period to DePreSys (46 start dates, 70 members, see Table S2 for details). We present results from this grand ensemble, but the DePreSys and CMIP5 ensembles were also analyzed separately with similar results (Figure S3).
Modeled and observed fields are expressed as anomalies, using a full-field bias correction method as described in Smith et al. [2013] such that each model is treated in the same way, regardless of initialization method. Modeled and observed trends are removed by linearly detrending, so that correlation is not inflated by the capturing of a climate change signal.
We test the significance of our results using a nonparametric block bootstrap method [Wilks, 2006; Smith et al., 2013] (see supporting information for further details). We present results without the use of cross validation for model corrections as cross validation leads to an underestimate of correlation [Smith et al., 2013; Gangsto et al., 2013], but our conclusions are not sensitive to this choice (Figures S2, S4, S5, and S7). We also assess probabilistic performance using reliability diagrams and Brier score [Wilks, 2006]. Model probabilities are calculated using the fraction of members that predict the event, with a correction for finite sample size [Wilks, 2006]. We present results for the event where the variable in questions is above the median value, as used in Corti et al. [2012]; however, similar results were also found for terciles.
3 Results
Figure 1 shows the RPC for seasonal hindcasts for December to February (lead times 2 to 4 months, Figures 1a–1c) and decadal hindcasts of 4 year means at lead times of 2–5 years (Figures 1d–1f). The RPC is not significantly different to the expected value of one over 70% and 57% of grid points for the seasonal and decadal hindcasts, respectively (averaging over all three variables and ignoring points where no observations are available). However, there are many regions where the RPC is significantly smaller than one, especially on longer time scales. This is indicative of overconfident forecasts in which ensemble members agree well with each other (high signal-to-noise ratio) but do not capture the observed variations (low correlation). Overconfidence in regions where the RPC is significantly smaller than one is confirmed by reliability diagrams (Figure 2a). Ideally, the slope would equal one, such that the observed frequency of occurrence equals the predicted probability. However, the slope is close to zero (red curve in Figure 2a), showing that the likelihood of an event occurring in reality is almost independent of the predicted probability.
Figure 1.

The ratio of predictable components (RPC) for (a–c) seasonal hindcasts of December-January-February (DJF) means and (d–f) decadal hindcasts of 4 year annual means for years 2–5, for near-surface temperature (SAT, column 1), mean sea level pressure (MSLP, column 2), and precipitation (PREC, column 3). Model and observed data are smoothed over regions of 11.25° latitude by 12.5° longitude (15° by 15° for SAT), similar to previous studies [Smith et al., 2010; Eade et al., 2012; Goddard et al., 2012]. Stippling identifies where RPC is significantly different to one at the 90% level (see text for details). Regions of negative correlation are masked out as they imply zero skill (see Figure S1 for version with insignificant correlations masked).
Figure 2.

Reliability diagrams of a median threshold event from years 2–5 decadal hindcasts of MSLP, for regions where RPC is (a) significantly lower than one and (b) significantly greater than one (as Figure 1e). The red reliability curves are for bias-corrected model output; red frequency histograms display the percentage of total hindcasts allocated to each bin. The blue reliability curves are for the RPC-corrected model output, with vertical bars showing the 5–95% confidence interval and blue frequency histograms.
Overconfidence of seasonal forecasts is well known, and much research has been undertaken to improve reliability by increasing ensemble spread (see Williams et al. [2013], for a comparison of approaches). Due to our ensemble being finite, this is expected as our r value is an underestimate of reality while our σ2sig is an overestimate. However, there are also clear regions where the RPC is significantly greater than one. This is indicative of underconfidence, where the ensemble mean agrees relatively well with observations (high correlation) but ensemble members agree less well with each other (low model signal-to-noise ratio). Underconfidence for regions where the RPC is significantly greater than one is confirmed by reliability diagrams in which the slope is greater than one (Figure 2b, red curve). RPC values greater than one suggest that models underestimate the predictability of the real world. This might arise in the multimodel ensemble if some models lack sources of predictability. However, we find very similar RPC patterns (Figure S3) when our multimodel ensemble is split into a single model (HadCM3, 37 members) and the remaining four CMIP5 models (29 members). This suggests that the regions where predictability is underestimated by the multimodel ensemble, which is typically the most accurate forecast [Palmer et al., 2004], are also seen in individual models.
Regions where the RPC is greater than one appear to be associated with known variability of the climate system. For example, seasonal MSLP (Figure 1b) shows regions with a high RPC over both Iceland and the Azores, the centers of action of the NAO, with a low RPC in between as would be expected for variability associated with this oscillation. The decadal hindcasts show a high RPC for SAT in the North Atlantic (Figure 1d), which is the region most improved by initialization in decadal predictions [e.g., Doblas-Reyes et al., 2013; Smith et al., 2010] through improved initialization of the Atlantic Meridional Overturning Circulation [Pohlmann et al., 2013; Robson et al., 2012, 2014; Yeager et al., 2012]. Furthermore, a high RPC for MSLP in the tropical Atlantic (Figure 1e) and PREC in the Sahel (Figure 1f) is consistent with a high RPC for SAT in the North Atlantic, since Atlantic temperatures have been shown to drive variability in these regions [e.g., Zhang and Delworth, 2006; Dunstone et al., 2011].
High values of the RPC associated with the NAO and tropical Atlantic MSLP are consistent with skilful predictions of the NAO [Scaife et al., 2014], Arctic Oscillation [Riddle et al., 2013], and Atlantic hurricane frequency [Smith et al., 2010] obtained with large ensembles. The models underestimate the true predictable component for these variables, and each ensemble member contains too much noise. However, taking the mean of a large ensemble leaves a predictable component that correlates well with reality, more so than would be expected from the original signal-to-noise ratio. This is illustrated in Figures 3a and 3b with time series of the NAO from the seasonal hindcasts and MSLP averaged over the eastern part of the hurricane main development region (MDR; 10–20 N, 60–20 W) from the decadal hindcasts. In both cases the ensemble spread is too large given the high correlations (r = 0.63 for the NAO and 0.71 for MDR MSLP, both significant at 99% level), resulting in a RPC significantly higher than one at the 90% level (RPC = 2.3 for the NAO and 2.4 for MDR MSLP).
Figure 3.

Time series of (a) the North Atlantic Oscillation (NAO) for DJF from seasonal hindcasts and (b) MSLP in the eastern Hurricane Main Development Region (MDR) for years 2–5 from decadal hindcasts. Observations are shown by black curves, and ensemble mean model forecasts as red curves with grey shading showing the ensemble 5–95% range. (c and d) Results after applying the RPC adjustment. The NAO index is calculated as the difference in MSLP between grid points containing Stykkisholmur, Iceland (65°N, 22°W), and Ponta Delgada for the Azores (37°N, 25°W) [e.g., Jones et al., 1997]. The MDR index is calculated as the area-averaged MSLP for region 10–20 N, 60–20 W in which RPC is greater than one.
4 RPC Correction
Model bias is necessarily, and routinely, corrected as a postprocessing step [Stockdale, 1997; International CLIVAR Project Office, 2011]. Likewise, where the RPC is not equal to the desired value of one, we propose that an additional postprocessing adjustment should be applied. Various methods have been proposed to correct overconfidence by increasing ensemble spread [e.g., Raftery et al., 2005; Roulston and Smith, 2003], but these do not address the problem of underconfidence highlighted here and in other studies [Scaife et al., 2014; Ho et al., 2013]. Other methods may increase or decrease ensemble spread to improve forecast reliability [e.g., Gneiting et al., 2005; Hamill and Colucci, 1997; Eckel and Walters, 1998] but are not specifically designed to adjust the ensemble mean which is required to match the predictable variances in the forecasts and observations. We therefore propose an ensemble adjustment based on linear correction with rescaling [Williams et al., 2013], but with parameters diagnosed to ensure that the RPC equals one. This is achieved by computing an adjusted ensemble mean (
) such that its variance over time (t) is equal to the predictable component of the observed variance (σ2obs
r2):
| 2 |
where
is the original hindcast ensemble mean value for a given start time t and
is the average of these over all start times. We additionally transform each ensemble member (Ymt′) such that their variance about the new ensemble mean, i.e., the variance of the ensemble noise, becomes equal to the unpredictable variance of the observations, namely, σ2obs (1 − r2):
| (3) |
where Ymt is the original ensemble member value and σ2noi is the original noise variance:
| (4) |
This ensures that the total variance of the model is equal to σ2obs, with the predictable component now accounting for a fraction r2 of this.
We illustrate the effect of this RPC correction for the NAO and MDR MSLP time series in Figures 3c and 3d. For the NAO time series, the noise variance is roughly halved while the signal variance is quadrupled (Figure 3c). The MDR noise variance is reduced further, to a quarter of its original value, though the signal variance is only doubled (Figure 3d). The correction does not change correlation but does impact other skill measures. For example, mean squared skill score (MSSS) is moderately increased from 0.27 to 0.37 for the NAO and from 0.49 to 0.51 for MDR MSLP.
This RPC correction can be applied to global fields at the grid point level, excluding points with negative correlation to avoid artificial skill. We apply the correction where the correlation is significantly positive (using a t test at 90% level [Wilks, 2006]) and replace the hindcast with climatology elsewhere. This procedure will impact most skill measures (other than correlation), and we illustrate its impact on mean squared skill score (MSSS) [Murphy, 1988] in Figure 4 for decadal hindcasts of MSLP in the North Atlantic (Figure S8 shows the same for seasonal hindcasts, though the results are less significant in this case). Regions of negative MSSS such as South America and the Labrador Sea are improved simply by using climatology (seen by comparing with Figure S6 where insignificant correlations are masked), and the corrected MSSS is not greater than that of climatology. However, MSSS is significantly improved by the RPC correction and is significantly greater than climatology in the central North Atlantic, consistent with the MDR index (Figures 3b and 3d).
Figure 4.

Mean squared skill score (MSSS) for decadal hindcasts of 4 year mean MSLP in the North Atlantic for years 2–5 for (a) the ensemble mean, (b) RPC-corrected ensemble mean, and (c) their difference. Stippling identifies regions where MSSS is significantly greater than for climatology shown in Figures 4a and 4b, or significantly improved by the RPC correction shown in Figure 4c, at the 95% level (see text for details).
The RPC correction leads to some improvement in reliability, illustrated in Figure 2 for year 2–5 MSLP with similar results found for other variables and time scales (not shown). The forecast probabilities are transformed resulting in reliability curves slightly closer to the diagonal (Figure 2, blue curves). In both low and high RPC cases, the uncorrected model displays only moderate sharpness, with probabilities somewhat clustered about that of climatology (red histograms in Figure 2). For regions with a low RPC (overconfidence), and hence generally low correlations (Figure 2a), the corrected hindcasts are clustered even more closely to climatology so to become reliable but with very little sharpness (blue histograms in Figure 2a), and the Brier score has only been improved to equal that of climatology (0.28 reduced to 0.25, where an improved Brier score has a reduced value). For regions with a high RPC (underconfidence), the sharpness has been improved by the RPC correction such that model probabilities are more dispersed from the climatological value and now sample more extreme values (Figure 2b), and the Brier score has been improved (0.20 compared to 0.23 and 0.25 for the original model output and climatology, respectively). However, the reliability curve and frequency histogram may still be suboptimal, with a suggestion of overconfidence remaining for high probabilities (blue curve above the diagonal). On average, for regions where the RPC for decadal hindcasts is significantly greater (smaller) than one, the RPC correction increases (reduces) the signal variance of the ensemble mean by 260% (75%). The variance of individual ensemble members about this mean (i.e., the noise variance) is adjusted such that the total variance equals that of the observations.
5 Discussion and Conclusions
We have quantified the ratio of predictable components (RPC) in observations and models for seasonal and multiyear hindcasts of surface air temperature, mean sea level pressure, and precipitation. The RPC is not significantly different to the expected value of one for around 70% of grid boxes for GloSea5 seasonal hindcasts of DJF (December-January-February) a month ahead and 57% for years 2–5 in a multimodel ensemble of decadal hindcasts. The RPC is significantly smaller than one in many regions, especially for the multiyear hindcasts. RPC smaller than one occurs when model ensemble members agree with each other more than observations. The hindcasts are therefore overconfident. This is well known, and techniques have been developed to reduce overconfidence by increasing the ensemble spread. However, we find regions where the RPC is significantly greater than one in both seasonal and multiyear hindcasts. This appears to be related to known variability including the NAO and multiyear variability of North Atlantic temperatures and associated climate including MSLP in the hurricane development region.
Regions with a RPC greater than one suggest that the real world is more predictable than models. This has important implications:
Model ensemble members are generally not potential realizations of the true evolution of the climate system. Instead, each ensemble member contains some predictable signal but with too much noise.
Skilful and reliable forecasts may be obtained using a large ensemble. The ensemble mean removes the excessive noise, leaving the predictable component, but variance corrections to the ensemble mean and members are required to ensure that the predictable component in the resultant model forecast is equivalent to that in the observations.
Many probabilistic and deterministic skill measures applied to the raw hindcasts underestimate the potential skill.
Model-based estimates of predictability diagnosed by taking an individual model run as the truth [e.g., Boer et al., 2013; Branstator et al., 2012; Collins et al., 2006; Dunstone and Smith, 2010; Tietsche et al., 2014], which is often seen as the upper limit of predictability, may underestimate the true predictability.
Techniques aimed at improving reliability by increasing ensemble spread, including stochastic physics [e.g., Weisheimer et al., 2011], may exacerbate problems where the models are underconfident. Indeed, a recent study [Ho et al., 2013] showed that at longer lead times the model spread appears to be too large in many regions. It may ultimately be better to reduce errors rather than increase spread to address model overconfidence.
Though not assessed here, it is possible that RPC could be greater than one in climate change projections, in which case conclusions relating to the role of internal variability [e.g., Deser et al., 2014] may need revising.
Further work is needed to understand why the predictable component is smaller in models than in reality in some regions. One possibility is that the model atmosphere is not constrained strongly enough by the relevant drivers of predictability. Indeed, there is mounting evidence that models respond too weakly to North Atlantic sea surface temperatures. This is shown by direct analysis of model simulations [Mehta et al., 2000; Rodwell and Folland, 2002; Gastineau et al., 2013] and has also been inferred from a simple theoretical explanation of the lagged response to changes in solar radiation [Scaife et al., 2013]. Evidence suggests that the atmospheric response to North Atlantic SSTs may be stronger in higher-resolution models that resolve SST fronts in the Gulf Stream region [Minobe et al., 2008]. Future higher-resolution models may therefore yield improved levels of skill for a given ensemble size.
Acknowledgments
We acknowledge the World Climate Research Programme's Working Group on Coupled Modeling, which is responsible for CMIP, and we thank the climate modeling groups (listed in section 2 of this paper) for producing and making available their model output. For CMIP the U.S. Department of Energy's Program for Climate Model Diagnosis and Intercomparison provides coordinating support and led development of software infrastructure in partnership with the Global Organization for Earth System Science Portals. This work was supported by the joint DECC/Defra Met Office Hadley Centre Climate Programme (GA01101) and the EU FP7 SPECS project.
Supporting Information
Readme
Tables S1 and S2 and Figures S1–S8
References
- Alexander MA, Bladé I, Newman M, Lanzante JR, Lau N. Scott JD. The atmospheric bridge: The influence of ENSO teleconnections on air–sea interaction over the global oceans. J. Clim. 2002;15:2205–2231. [Google Scholar]
- Boer GJ. Decadal potential predictability of 21st century climate. Clim. Dyn. 2011;36:1119–1133. doi: 10.1007/s00382-010-0747-9. [Google Scholar]
- Boer GJ, Kharin VV. Merryfield WJ. Decadal predictability and forecast skill. Clim. Dyn. 2013;41:1817–1833. doi: 10.1007/s00382-013-1705-0. [Google Scholar]
- Branstator G, Teng H, Meehl GA, Kimoto M, Knight JR, Latif M. Rosati A. Systematic estimates of initial-value decadal predictability for six AOGCMs. J. Clim. 2012;25:1827–1846. doi: 10.1175/JCLI-D-11-00227.1. [Google Scholar]
- Collins M, et al. Interannual to decadal climate predictability in the North Atlantic: A multi-model ensemble study. J. Clim. 2006;19:1195–1203. doi: 10.1175/JCLI3654.1. [Google Scholar]
- Corti S, Weisheimer A, Palmer TN, Doblas-Reyes F. Magnusson L. Reliability of decadal predictions. Geophys. Res. Lett. 2012;39 L21712, doi: 10.1029/2012GL053354. [Google Scholar]
- Deser C, Phillips AS, Alexander MA. Smoliak BV. Projecting North American climate over the next 50 years: Uncertainty due to internal variability. J. Clim. 2014;27:2271–2296. doi: 10.1175/JCLI-D-13-00451.1. [Google Scholar]
- Doblas-Reyes FJ, García-Serrano J, Lienert F, Biescas AP. Rodrigues LRL. Seasonal climate predictability and forecasting: Status and prospects. WIREs Clim. Change. 2013;4:245–268. doi: 10.1002/wcc.217. [Google Scholar]
- Dunstone NJ. Smith DM. Impact of atmosphere and sub-surface ocean data on decadal climate prediction. Geophys. Res. Lett. 2010;37 L02709, doi: 10.1029/2009GL041609. [Google Scholar]
- Dunstone NJ, Smith DM. Eade R. Multi-year predictability of the tropical Atlantic atmosphere driven by the high latitude North Atlantic Ocean. Geophys. Res. Lett. 2011;28 L14701, doi: 10.1029/2011GL047949. [Google Scholar]
- Eade R, Hamilton E, Smith DM, Graham RJ. Scaife AA. Forecasting the number of extreme daily events out to a decade ahead. J. Geophys. Res. 2012;117 D21110, doi: 10.1029/2012JD018015. [Google Scholar]
- Eckel FA. Walters MK. Calibrated probabilistic quantitative precipitation forecasts based on the MRF ensemble. Weather Forecasting. 1998;13:1132–1147. doi: 10.1175/1520-0434(1998)013<1132:CPQPFB>2.0.CO;2. [Google Scholar]
- Frankignoul C, Gastineau G. Kwon Y-O. The influence of the AMOC variability on the atmosphere in CCSM3. J. Clim. 2013;26:9774–9790. doi: 10.1175/JCLI-D-12-00862.1. [Google Scholar]
- Gangsto R, Weigel AP, Liniger MA. Appenzeller C. Methodological aspects of the validation of decadal predictions. Clim. Res. 2013;55:181–200. doi: 10.3354/cr01135. [Google Scholar]
- Gastineau G, D'Andrea F. Frankignoul C. Atmospheric response to the North Atlantic Ocean variability on seasonal to decadal time scales. Clim. Dyn. 2013;40:2311–2330. doi: 10.1007/s00382-012-1333-0. [Google Scholar]
- Gneiting T, Raftery AE, Westveld AH. Goldman T. Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Weather Rev. 2005;133:1098–1118. doi: 10.1175/MWR2904.1. [Google Scholar]
- Goddard L, et al. A verification framework for interannual-to-decadal predictions experiments. Clim. Dyn. 2012;40:245–272. doi: 10.1007/s00382-012-1481-2. [Google Scholar]
- Goldenberg SB, Landsea CW, Mestas-Nunez AM. Gray WM. The recent increase in Atlantic hurricane activity: Causes and implications. Science. 2001;293:474–479. doi: 10.1126/science.1060040. doi: 10.1126/science.1060040. [DOI] [PubMed] [Google Scholar]
- Hamill TM. Colucci SJ. Verification of Eta–RSM short-range ensemble forecasts. Mon. Weather Rev. 1997;125:1312–1327. doi: 10.1175/1520-0493(1997)125<1312:VOERSR>2.0.CO;2. [Google Scholar]
- Hamilton E, Eade R, Graham RJ, Scaife AA, Smith DM, Maidens A. MacLachlan C. Forecasting the number of extreme daily events on seasonal timescales. J. Geophys. Res. 2012;117 D03114, doi: 10.1029/2011JD016541. [Google Scholar]
- Hawkins E. Sutton R. The potential to narrow uncertainty in projections of regional precipitation change. Clim. Dyn. 2011;37:407–418. doi: 10.1007/s00382-010-0810-6. [Google Scholar]
- Ho CK, Hawkins E, Shaffrey L, Brocker J, Hermanson L, Murphy JM, Smith DM. Eade R. Examining reliability of seasonal to decadal sea surface temperature forecasts: The role of ensemble dispersion. Geophys. Res. Lett. 2013;40:5770–5775. doi: 10.1002/2013GL057630. [Google Scholar]
- International CLIVAR Project Office (ICPO) 2011. Data and bias correction for decadal climate predictions , International CLIVAR Project Office, CLIVAR Publication Series No. 150, 6 pp. [Available at http://eprints.soton.ac.uk/171975/1/150_Bias_Correction.pdf .]
- Jones PD, Jonsson T. Wheeler D. Extension to the North Atlantic Oscillation using early instrumental pressure observations from Gibralter and South-west Iceland. Int. J. Clim. 1997;17:1433–1450. doi: 10.1002/(SICI)1097-0088(19971115)17:13<1433::AID-JOC203>3.0.CO;2-P. [Google Scholar]
- Kirtman B, Anderson D, Brunet G, Kang IS, Scaife AA. Smith DM. Prediction From Weeks to Decades. Dordrecht, Netherlands: Springer; 2013. pp. 205–235. Clim. Sci. Serv. Soc doi: 10.1007/978-94-007-6692-1_8. [Google Scholar]
- Knight JR, Folland CK. Scaife AA. Climate impacts of the Atlantic Multidecadal Oscillation. Geophys. Res. Lett. 2006;33 L17706, doi: 10.1029/2006GL026242. [Google Scholar]
- Knight JR, et al. Predictions of climate several years ahead using an improved decadal prediction system. J. Clim. 2014 in press. [Google Scholar]
- Kumar A. Finite samples and uncertainty estimates for skill measures for seasonal prediction. Mon. Weather Rev. 2009;137:2622–2631. doi: 10.1175/2009MWR2814.1. [Google Scholar]
- Kumar A, Peng P. Chen M. Is there a relationship between potential and actual skill? Mon. Weather Rev. 2014;142:2220–2227. doi: 10.1175/MWR-D-13-00287.1. [Google Scholar]
- Lorenz EN. Deterministic nonperiodic flow. J. Atmos. Sci. 1963;20:130–141. doi: 10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2. [Google Scholar]
- MacLachlan C, et al. Global seasonal forecast system version 5 (GloSea5): A high resolution seasonal forecast system. Q. J. R. Meteorol. Soc. 2014 doi: 10.1002/qj.2396. [Google Scholar]
- Meehl GA, et al. Decadal climate prediction: An update from the trenches. BAMS. 2014 doi: 10.1175/BAMS-D-12-00241.1. [Google Scholar]
- Mehta VM, Suarez MJ, Manganello JV. Delworth TL. Oceanic influence on the North Atlantic Oscillation and associated Northern Hemisphere climate variations: 1959–1993. Geophys. Res. Lett. 2000;27:121–124. doi: 10.1029/1999GL002381. [Google Scholar]
- Minobe S, Yoshida AK, Komori N, Xie S-P. Small RJ. Influence of the Gulf Stream on the troposphere. Nature. 2008;452:206–209. doi: 10.1038/nature06690. doi: 10.1038/nature06690. [DOI] [PubMed] [Google Scholar]
- Murphy AH. Skill scores based on the mean square error and their relationships to the correlation coefficient. Mon. Weather Rev. 1988;116:2417–2424. doi: 10.1175/1520-0493(1988)116%3C2417:SSBOTM%3E2.0.CO;2. [Google Scholar]
- Palmer TN, et al. Development of a European Multi-Model Ensemble System for Seasonal to Inter-Annual Prediction (DEMETER) Bull. Am. Meteorol. Soc. 2004;85:853–72. doi: 10.1175/BAMS-85-6-853. [Google Scholar]
- Pohlmann H, Smith DM, Balmaseda MA, Keenlyside NS, Masina S, Matei D, Muller WA. Rogel P. Predictability of the mid-latitude Atlantic meridional overturning circulation in a multi-model system. Clim. Dyn. 2013;41:775–785. doi: 10.1007/s00382-013-1663-6. [Google Scholar]
- Raftery AE, Gneiting T, Balabdaoui F. Polakowski M. Using Bayesian model averaging to calibrate forecast ensembles. Mon. Weather Rev. 2005;133:1155–1174. doi: 10.1175/MWR2906.1. [Google Scholar]
- Riddle EE, Butler AH, Furtado JC, Cohen JL. Kumar A. CFSv2 ensemble prediction of the wintertime Arctic Oscillation. Clim. Dyn. 2013;41:1099–1116. doi: 10.1007/s00382-013-1850-5. [Google Scholar]
- Robson JI, Sutton RT. Smith DM. Initialised decadal predictions of the rapid warming of the North Atlantic Ocean in the mid 1990s. Geophys. Res. Lett. 2012;39 L19713, doi: 10.1029/2012GL053370. [Google Scholar]
- Robson JI, Sutton RT. Smith DM. Decadal predictions of the cooling and freshening of the North Atlantic in the 1960s and the role of ocean circulation. Clim. Dyn. 2014;42:2353–2365. doi: 10.1007/s00382-014-2115-7. [Google Scholar]
- Rodwell M. Folland C. Atlantic air-sea interaction and seasonal predictability. Q. J. R. Meteorol. Soc. 2002;128:1413–1443. doi: 10.1002/qj.200212858302. [Google Scholar]
- Roulston MS. Smith LA. Combining dynamical and statistical ensembles. Tellus A. 2003;55:16–30. doi: 10.1034/j.1600-0870.2003.201378.x. [Google Scholar]
- Scaife AA, Ineson S, Knight JR, Gray L, Kodera K. Smith DM. A mechanism for lagged North Atlantic climate response to solar variability. Geophys. Res. Lett. 2013;40:434–439. doi: 10.1002/grl.50099. [Google Scholar]
- Scaife AA, et al. Skilful long range prediction of European and North American winters. Geophys. Res. Letts. 2014;41:2514–1519. doi: 10.1002/2014GL059637. [Google Scholar]
- Smith DM, Eade R, Dunstone NJ, Fereday D, Murphy JM, Pohlmann H. Scaife AA. Skilful multi-year predictions of Atlantic hurricane frequency. Nat. Geosci. 2010;3:846–849. doi: 10.1038/NGEO1004. [Google Scholar]
- Smith DM, Scaife AA. Kirtman B. What is the current state of scientific knowledge with regard to seasonal and decadal forecasting? Environ. Res. Lett. 2012;7:015,602. doi: 10.1088/1748-9326/7/1/015602. [Google Scholar]
- Smith DM, Eade R. Pohlmann H. A comparison of full-field and anomaly initialization for seasonal to decadal climate prediction. Clim. Dyn. 2013;41:3325–3338. doi: 10.1007/s00382-013-1683-2. [Google Scholar]
- Stockdale TN. Coupled ocean–atmosphere forecasts in the presence of climate drift. Mon. Weather Rev. 1997;125:809–818. doi: 10.1175/1520-0493(1997)125%3C0809:COAFIT%3E2.0.CO;2. [Google Scholar]
- Sutton R. Hodson D. Climate response to basin-scale warming and cooling of the North Atlantic Ocean. J. Clim. 2007;20(5):891–907. doi: 10.1175/JCLI4038.1. [Google Scholar]
- Sutton RT. Dong B. Atlantic Ocean influence on a shift in European climate in the 1990s. Nat. Geosci. 2012;5:788–792. doi: 10.1038/ngeo1595. [Google Scholar]
- Taylor KE, Stouffer RJ. Meehl GA. An overview of CMIP5 and the experiment design. Bull. Am. Meteorol. Soc. 2012;93(4):485–498. doi: 10.1175/BAMS-D-11-00094.1. [Google Scholar]
- Tietsche S, Day JJ, Guemas V, Hurlin WJ, Keeley SPE, Matei D, Msadek R, Collins M. Hawkins E. Seasonal to interannual Arctic sea-ice predictability in current GCMs. Geophys. Res. Lett. 2014;41:1035–1043. doi: 10.1002/2013GL058755. [Google Scholar]
- Trenberth KE. Caron JM. The southern oscillation revisited: Sea level pressures, surface temperatures and precipitation. J. Clim. 2000;13:4358–4365. doi: 10.1175/1520-0442(2000)013<4358:TSORSL>2.0.CO;2. [Google Scholar]
- Wang B, et al. Advance and prospectus of seasonal prediction, 2008: Assessment of the APCC/CliPAS 14-Model ensemble retrospective seasonal prediction (1980–2004) Clim. Dyn. 2009;33:93–117. doi: 10.1007/s00382-008-0460-0. [Google Scholar]
- Weisheimer A, Palmer TN. Doblas-Reyes FJ. Assessment of representations of model uncertainty in monthly and seasonal forecast ensembles. Geophys. Res. Lett. 2011;38 L16703, doi: 10.1029/2011GL048123. [Google Scholar]
- Wilks DS. Statistical Methods in the Atmospheric Sciences. 2nd ed. San Diego, Calif: Academic Press; 2006. [Google Scholar]
- Williams RM, Ferro CAT. Kwasniok F. A comparison of ensemble post-processing methods for extreme events. Q. J. R. Meteorol. Soc. 2013;140(680):1112–1120. doi: 10.1002/qj.2198. [Google Scholar]
- Wilson C, Sinha B. Williams RG. The effect of ocean dynamics and orography on atmospheric storm tracks. J. Clim. 2009;22:3689–3702. doi: 10.1175/2009JCLI2651.1. [Google Scholar]
- Woollings T, Gregory JM, Pinto JG, Reyers M. Brayshaw DJ. Response of the North Atlantic storm track to climate change shaped by ocean–atmosphere coupling. Nat. Geosci. 2012;5:313–317. doi: 10.1038/ngeo1438. [Google Scholar]
- Yeager S, Karspeck A, Danabasoglu G, Tribbia J. Teng H. A decadal prediction case study: Late 20th century North Atlantic Ocean heat content. J. Clim. 2012;25:5173–5189. doi: 10.1175/JCLI-D-11-00595.1. [Google Scholar]
- Younas W. Tang Y. PNA predictability at various time scales. J. Clim. 2013;26:9090–9114. doi: 10.1175/JCLI-D-12-00609.1. [Google Scholar]
- Zhang R. Delworth TL. Impact of Atlantic multidecadal oscillations on India/Sahel rainfall and Atlantic hurricanes. Geophys. Res. Lett. 2006;33 L17712, doi: 10.1029/2006GL026267. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Readme
Tables S1 and S2 and Figures S1–S8

