Significance
Although El Niño events characterized by anomalous episodic warmings of the eastern equatorial Pacific can trigger disasters in various parts of the globe, reliable forecasts of their magnitude are still limited to about 6 mo ahead. A significant extension of this prewarning time would be instrumental for mitigating some of the worst damages. Here we introduce an approach relying on information entropy, which achieves some doubling of the prewarning time. The approach is based on our finding that the entropy in one calendar year exhibits a strong correlation with the magnitude of an El Niño that starts in the following year and thus allows us to forecast the onset and the magnitude of an El Niño event 1 y in advance.
Keywords: ENSO, system complexity, entropy, spring barrier, forecasting
Abstract
The El Niño Southern Oscillation (ENSO) is one of the most prominent interannual climate phenomena. Early and reliable ENSO forecasting remains a crucial goal, due to its serious implications for economy, society, and ecosystem. Despite the development of various dynamical and statistical prediction models in the recent decades, the “spring predictability barrier” remains a great challenge for long-lead-time (over 6 mo) forecasting. To overcome this barrier, here we develop an analysis tool, System Sample Entropy (SysSampEn), to measure the complexity (disorder) of the system composed of temperature anomaly time series in the Niño 3.4 region. When applying this tool to several near-surface air temperature and sea surface temperature datasets, we find that in all datasets a strong positive correlation exists between the magnitude of El Niño and the previous calendar year’s SysSampEn (complexity). We show that this correlation allows us to forecast the magnitude of an El Niño with a prediction horizon of 1 y and high accuracy (i.e., root-mean-square error = 0.23° C for the average of the individual datasets forecasts). For the 2018 El Niño event, our method forecasted a weak El Niño with a magnitude of 1.11±0.23° C. Our framework presented here not only facilitates long-term forecasting of the El Niño magnitude but can potentially also be used as a measure for the complexity of other natural or engineering complex systems.
The El Niño Southern Oscillation (ENSO), the interannual fluctuation between anomalous warm and cold conditions in the tropical Pacific, is one of the most influential coupled ocean–atmosphere climate phenomena on Earth (1–4). The warm phase of ENSO (El Niño) is characterized by an abnormal warming of the eastern equatorial Pacific, which occurs about every 2 to 7 y. The Oceanic Niño Index (5) (ONI) is the primary indicator that the National Oceanic and Atmospheric Administration uses to monitor and identify ENSO events. It is the 3-m running mean of sea surface temperature (SST) anomalies in the Niño 3.4 region (, , shown in Fig. 1 as the region inside the pink rectangle). An El Niño event is defined to take place if the ONI is at or above C for at least 5 consecutive months (red in Fig. 2A). Here we use the value of the highest peak of the ONI during an El Niño event to quantify its magnitude.
Fig. 1.
The Niño 3.4 region. The red circles indicate the 22 nodes in the Niño 3.4 region with a spatial resolution of . The curves are examples of the temperature anomaly time series for 3 nodes in the Niño 3.4 region for one specific year, and several examples of their subsequences are marked in black.
Fig. 2.
Correlation between SysSampEn and El Niño magnitude. (A) The heights of the blue rectangles indicate the values of the SysSampEn (left scale) for the calendar years preceding El Niño events, calculated from ERA-Interim, by using the set of parameters ( d, , d, and ) that correspond to the highest correlation with El Niño magnitudes. The red curve is the ONI and the red shades indicate El Niño periods (right scale). (B) Scatter plot of the maximal El Niño magnitude versus previous calendar year’s SysSampEn (blue rectangles in A). The gray region indicates values of the SysSampEn, which predict for the maximal ONI less than C and thus by definition non-El Niño events. The green dashed line shows the best least-square fitted line. (C) The y coordinate of each purple dot is the averaged correlation for parameter combinations with accuracy no less than a certain level (i.e., its x coordinate) in both the spatial asynchrony and the temporal disorder tests. The correlation between SysSampEn and the El Niño magnitude is monotonously increasing with increasing accuracy level. The calculation of the accuracy level is independent of any El Niño events, and thus the strong correlation between the SysSampEn and the El Niño magnitudes emerges naturally without fitting.
El Niño has been reported to affect the marine ecosystems, commercial fisheries, agriculture, and public safety and even to bring extreme weather conditions to many parts of the globe (6–14). Thus, the understanding of the underlying mechanism and prediction of El Niño are of great importance for humanity. Numerous models, dynamical as well as statistical ones, were developed to simulate and forecast El Niño events. Dynamical models (15–24) express mathematically the physical equations of the ocean–atmosphere system. In contrast, statistical model (25, 26) forecasts of El Niño are based on data-driven analyses. During the past decades, the prediction of El Niño has made great progress and skillful forecasts at shorter lead times (up to around 6 mo) are possible (27–29). However, both types of models reveal very low predictability before and during boreal spring (February to May). This is the so-called spring predictability barrier (SPB) (30–33).
Recently, several approaches based on climate networks were developed to forecast the onsets of El Niño around 1 y in advance (34–37). One of these approaches (34) has correctly forecasted all El Niño onsets or their absence since 2012. However, this method is unable to predict the magnitude of the event. Predicting the magnitude is crucial since a stronger El Niño usually causes more extreme events (e.g., floods, droughts, or severe storms) which have serious consequences for economies, societies, and ecosystems. In particular, the El Niño events which started in 1997 and 2014 exhibited relatively high magnitudes and had major impacts on the dynamics and structure of tropical and temperate ecosystems worldwide (38). To fill this gap, here we develop an analysis tool, System Sample Entropy (SysSampEn), to quantify the spatiotemporal disorder degree of temperature variations in the Niño 3.4 region and to forecast the El Niño magnitude before the SPB. Based on a calendar year’s data we forecast if in the following year an El Niño will start or not. Once the SysSampEn approach forecasts the occurrence of an El Niño onset, we are able to forecast its magnitude with high skill (i.e., correlation and C between the forecasted and observed magnitudes for the El Niño events that occurred during the last 35 y). We like to mention that the SysSampEn approach roughly doubles the lead time at comparable skill. The skill of our El Niño magnitude forecast, based on the previous year’s SysSampEn, and thus with a lead time of about 1 y, is comparable to the best state-of-the-art model forecasts which start in June (i.e., with 6-mo lead time) and predict the same year’s boreal winter (November through January) ONI (39, 40).
SySampEn
We define the SysSampEn for a complex system as a generalization of sample entropy (SampEn) and Cross-SampEn (41). SampEn was introduced as a modification of approximate entropy (42, 43). It measures the complexity related to the Kolmogorov entropy (44), the rate of information production, of a process represented by single time series. The Cross-SampEn was introduced to measure the degree of asynchrony or dissimilarity between 2 related time series (41, 45). Both have been widely used in physiological fields, for example to make early diagnoses before the clinical signs of neonatal sepsis by analyzing heart rate variability (46), to implement an automatic diagnosis of epileptic electroencephalogram (47), and to discriminate different sensory conditions by analyzing human postural sway data (48).
However, a complex system such as the climate system is usually composed of several related time series (e.g., curves in Fig. 1). Therefore, here we introduce the SysSampEn as a measure of the system complexity, to quantify simultaneously the mean temporal disorder degree of all of the time series in a complex system and the asynchrony among them. Specifically, it approximately equals to the negative natural logarithm of the conditional probability that 2 subsequences similar (within a certain tolerance range) for consecutive data points remain similar for the next points, where the subsequences can originate from either the same or different time series (e.g., black curves in Fig. 1), that is,
| [1] |
where is the number of pairs of similar subsequences of length , is the number of pairs of similar subsequences of length , is the number of data points used in the calculation for each time series of length , and is a constant which determines the tolerance range. The detailed definition of SysSampEn for an arbitrary complex system composed of time series is described in Data and Methods. When , , and , our definition is equivalent to the classical SampEn (41). As is the case for SampEn and Cross-SampEn, before the SysSampEn can be used as an effective tool, appropriate parameter values have to be identified since only certain value combinations can be used to estimate a system’s complexity with considerable accuracy. To better demonstrate the mathematical meaning of our SysSampEn, we show in SI Appendix, Fig. S1 an example (the logistic map) of applying the SysSampEn to estimate the system complexity and compare it with the Lyapunov exponents. We find that higher (lower) values of the SysSampEn are strongly associated with higher (lower) Lyapunov exponents, which reveals that the SysSampEn can well capture the complexity of the system. However, we note that the effective parameter combinations may be different in different complex systems. Here for the climate system we study for El Niño prediction, we choose to be 30 d or 60 d and to be 15 d or 30 d since El Niño is an interannual phenomenon.
Strong Positive Correlation between the El Niño Magnitude and Its Previous Calendar Year’s SysSampEn.
We calculate the SysSampEn of the climate system composed of the near-surface air or SST anomaly time series in the Niño 3.4 region and find a strong positive correlation between El Niño magnitude and the SysSampEn of its previous calendar year (Fig. 2 A and B). This positive correlation is significant ( on average) and robust across all of the analyzed datasets [ERA-Interim 1,000 hPa air temperature (49) (ERA-Interim), ERA5 1,000 hPa air temparature (50) (ERA5), ERA5 SST, and JRA55-do SST (51)] (SI Appendix, Fig. S2).
In the following, we present our results based on the dataset of ERA-Interim, which gives the highest correlation. For a given calendar year between 1984 and 2018, we construct a system composed of temperature anomaly time series in the Niño 3.4 region (Fig. 1) with a spatial resolution of .
First, we determine the parameter combinations for the SysSampEn, which enable an accurate estimation of the system’s complexity. We do this by performing 2 tests (for details see Data and Methods), which determine, for a given parameter combination, the ability of the SysSampEn to discriminate between higher and lower disordered systems. In the temporal disorder test, we add random numbers to the real temperature data, while in the spatial asynchrony test we compare 2 systems, one which is constructed from neighboring points on the globe and one which is constructed from randomly chosen points on the globe. An accurate complexity measure should be able to recognize the higher disorder in the more random system and thus assign a higher SysSampEn value to it. We define accuracy as the percentage of correct assignments. Thus, using suitable parameter combinations for the SysSampEn we can quantify the temporal as well as the spatial disorder in the system.
Surprisingly, we find that the previous calendar year’s SysSampEn exhibits a strong positive correlation with the magnitude of El Niño, if the parameter combination for the SysSampEn can quantify the system complexity with good accuracy. Fig. 2C demonstrates in one example, d and d, that with changing the values of and in Eq. 1 the Pearson correlation () between the El Niño magnitude and the previous calendar year’s SysSampEn (e.g., blue rectangles in Fig. 2A) increases significantly with the accuracy level. Note that the accuracies are calculated fully independently of any El Niño magnitude analyses or forecasts. Thus, the strong correlation between the SysSampEn and the El Niño magnitude is naturally obtained from the parameter combinations, which enable the SysSampEn to quantify the system complexity with high accuracy. In other words, the high predictability of the El Niño magnitude before the SPB is not the result of overfitting but it originates from the strong and robust correlation between system complexity and El Niño magnitude.
We also find that the pattern of the SysSampEn between 1984 and 2018 is independent of the data resolution and highly consistent for different parameter combinations which provide high accuracy (SI Appendix, Fig. S3 and Table S1). In particular, the correlation between the El Niño magnitude and the previous calendar year’s SysSampEn with different effective ( accuracy level) parameters are all significantly high (the average is ), while the best correlation is obtained for d, d, d, and (Fig. 2B).
We performed the same analysis on the other datasets and obtain similar results (SI Appendix, Figs. S4–S6). We also present in SI Appendix, Fig. S2 the scatter plots of the El Niño magnitude versus the previous calendar year’s SysSampEn that give the highest for each of the other 3 datasets. The correlation is also significantly high for the other 3 datasets, and the average when using all high-accuracy parameter combinations of the 4 datasets (SI Appendix, Table S1) is . Note that the 2009 El Niño is the only event missed in the onset forecasts (discussed below) and is an exception in the linear relationship.
To obtain the best forecasting performance, we choose the SysSampEn parameters by first conducting an accuracy test and only accepting parameter combinations which lead to a high accuracy (accuracy level for air temperature and for SST) in both the spatial asynchrony and the temporal disorder tests. From these high-accuracy parameter combinations we choose in the second step the one which gives the highest correlation with the magnitudes of the past El Niño events. We repeat this for all datasets. Table 1 shows the parameters that suggest the highest for El Niño events before 2018 in the different datasets.
Table 1.
Values of parameters that suggest the highest correlation between El Niño magnitude and its previous calendar year’s SysSampEn during the period between 1984 and 2017
| Data | Parameter | 2018, °C | ||||||
| Type | Name | Resolution | , d | d | d | |||
| T 1,000 hPa | ERA-Interim | 60 | 15 | 9 | 345 | 0.99 | 1.67 | |
| ERA5 | 30 | 30 | 8 | 330 | 0.87 | 0.58 | ||
| SST | ERA5 | 30 | 30 | 5 | 330 | 0.86 | 1.09 | |
| JRA-do | 30 | 30 | 5 | 360 | 0.87 | 1.09 | ||
| Average | 0.90 | 1.11 | ||||||
We would like to note that, by repeating our calculation to the average SampEn per node or the average Cross-SampEn for each pair of nodes in the Niño 3.4 region, we get less significant correlations ( on average) than in the SysSampEn approach.
Forecasts of El Niño Magnitudes and Onsets.
Based on the substantial correlations between SysSampEn and El Niño magnitude, we develop efficient hindcasting and forecasting methods for both the El Niño onsets and magnitudes (introduced in Data and Methods).
To show the high predictability of the El Niño onset before the SPB, we compose a new index (rectangles in Fig. 3A) by substituting the value of the SysSampEn for each calendar year into the best-fitting linear functions (green dashed lines in Fig. 2B and SI Appendix, Fig. S2) and then taking the average over all of the 4 datasets. Thus the new index has the unit of degrees Celsius. We find that the value of this index for one specific calendar year can be used to forecast the presence or absence of an El Niño onset in the following year with very good accuracy, that is, 9 out of 10 correct forecasts of El Niño onsets (dark blue rectangles), with only one missed (pink rectangle) and 21 out of 24 correct forecasts of El Niño onset absence years (transparent rectangles), with 3 missed (gray rectangles). The detailed algorithm is introduced in Data and Methods.
Fig. 3.
Forecasting El Niño onsets and magnitudes. (A) The value of onset forecasting index (average forecast over the 4 datasets) is shown as the height of rectangles and is used to forecast the occurrence or absence of an El Niño onset in the following year. If the index value is C and the observed ONI in December is below C, we forecast the onset of an El Niño in the following year. The blue rectangles show the correctly forecasted El Niño onsets, the pink rectangle indicates a missed El Niño event, the gray rectangles indicate false alarms, and the transparent rectangles show when the absence of an El Niño onset was correctly forecasted. (B) Observed temperature versus the leave-one-out hindcasted temperature for the El Niño magnitudes (orange dots) before 2018. The obtained RMSE is C. The forecasted magnitude ( C) of the 2018 El Niño event is plotted as a light green dot with an error bar of . The + symbols indicate the hindcasted (forecasted) values obtained by using each of the 4 datasets. (C) Forecasts of the 2004, 2006, and 2014 El Niño magnitudes based only on past information. The error bar for each forecasted El Niño event (blue points) equals (i.e., C, C, and C for the 2004, 2006, and 2014 events, respectively) and is calculated from the leave-one-out hindcasts which lie in the regarded events past. Thus, the forecasted value, as well as its error bar (i.e., ), are only based on the event’s past information. The red dots show the observed magnitudes and are within the error bars. The forecasted 2018 magnitude and its error bar are shown in light green.
To demonstrate the high predictability of the El Niño magnitudes before the SPB, we first perform leave-one-out hindcasts (described in Data and Methods) of the magnitudes for all of the El Niño events between 1984 and 2017. For each dataset, we use the parameter combination in the function of SysSampEn that gives the highest correlation between SysSampEn and the magnitudes of the El Niño events before 2018 (Table 1). The observed El Niño magnitudes and hindcasted magnitudes are shown in Fig. 3B. Compared to the real data, we find that our hindcasting method is quite efficient with considerable accuracy, that is, the root-mean-square error (RMSE) C. This indicates that the SysSampEn method has the potential for skillful El Niño magnitudes forecasts with a prediction horizon of 1 y.
Second, we perform magnitude forecasts for the 2004, 2006, and 2014 El Niño events by using only data from the event’s past (Data and Methods) and find that the differences between the observed and forecasted values are within (Fig. 3C). These results indicate that can be regarded as an error bar. The RMSE is obtained by leave-one-out hindcasting applied only to the regarded event’s past (e.g., for the 2004 El Niño it depends only on the period 1984 through 2003). Analogously, the SysSampEn parameters also depend only on the regarded event’s past and are given in SI Appendix, Tables S2–S4. Note that for later El Niño events, as more data become available for our method the estimated RMSEs become smaller (Fig. 3C). The forecast performance for the last 3 El Niño events demonstrates the ability of our method to forecast El Niño magnitude as well as providing correct error estimates.
Next, we apply the SysSampEn method to forecast the magnitude of the 2018 El Niño event, based only on data up to 2017. The used SysSampEn parameters are given in Table 1 and obtain for its magnitude C, with an error bar of C.
Discussion
We have introduced the SysSampEn for complex systems and applied it to estimate the spatiotemporal disorder degree of temperature variations in the Niño 3.4 region. We find that a low degree of horizontal synchronization and a high degree of random temporal variations in the SST or the near-surface air temperature are precursors for a strong El Niño. Our results reveal a high predictability of both the El Niño onsets and magnitudes already before the boreal spring of the El Niño onset year. Between 1984 and 2018 our method correctly predicted 9 out of 10 El Niño onsets, while the absence of an El Niño onset was correctly predicted in 21 out of 24 cases. For the magnitude of the correctly predicted El Niño we obtain a forecast RMSE of C. In particular, for the last El Niño that started in 2018 our method predicts a weak El Niño with a magnitude of C, based only on data until the calendar year 2017.
We note that for shorter close to half a year the correlations between El Niño magnitudes and SysSampEn are still high for certain ranges of parameters (SI Appendix, Figs. S3–S6 C). This indicates that even earlier El Niño forecasts up to 18 mo in advance could be achieved, but with lower prediction skill.
As a possible mechanism underlying our method, we find some clues from the relationship between the near-surface ocean turbulence and the SST variations (52). Recently, it was discovered that a strong El Niño is related to intense ocean turbulence, which is characterized by large lateral diffusivity (53–55). An enhanced lateral diffusivity during an El Niño leads to weaker horizontal temperature gradients and higher horizontal mixing, which in turn results in a lower SysSampEn in the Niño 3.4 region. This conjecture is supported by our finding that the SysSampEn for El Niño years is inversely proportional to the El Niño magnitude, that is, for calendar years which include a strong El Niño, the SysSampEn tends to be lower (for details see SI Appendix, Fig. S7). Additionally, we also observe a memory effect in the dynamical evolution of the SysSampEn, that is, a smaller SysSampEn is more likely to be followed by a larger one and a larger one to be followed by a smaller one, as demonstrated in SI Appendix, Fig. S8. Thus a calendar year with a strong El Niño and thus lower SysSampEn is likely to be preceded by a calendar year with higher SysSampEn. We hypothesize that this higher SysSampEn might be caused by a weak surface lateral diffusivity during the calendar year before an El Niño onset, but further analyses based on climate models and observation data are needed. We note that the diffusivity of the ocean surface mesoscale turbulence in certain regions has just been found to be correlated with the Niño 3.4 index (55), which supports our hypothesis. We also notice that the percolation phase transition analysis was developed to study the influence of ENSO on climate and further help us in better predicting the subsequent events triggered by ENSO (56, 57).
The theoretical framework developed here not only improves the long-lead-time El Niño forecasting capability but could lead to improved forecasts or new insights when applied to other nonlinear and complex dynamical systems (58, 59).
Data and Methods
Data.
The ERA-Interim archive of the European Centre for Medium-Range Weather Forecasts (ECMWF) is available at https://apps.ecmwf.int/datasets/. ERA-Interim is a global atmospheric reanalysis starting from 1979 and is regularly updated. In the present work, we used the zero o’clock daily near-surface (1,000 hPa) temperature, downloaded with a spatial (zonal and meridional) resolution of . Data for years from 1979 to 2017 were downloaded on 4 October 2018, and data for the last year, 2018, were updated on 29 January 2019.
The ERA5 (https://climate.copernicus.eu/climate-reanalysis) is a climate reanalysis dataset developed through the Copernicus Climate Change Service (C3S). It is currently available for the period since 1979 within 3 mo of real time. The analysis field of ERA5 has a higher spatial resolution of 31 km and a higher temporal resolution of 1 h, compared to ERA-Interim. Data used in the present work is the zero o’clock daily near-surface (1,000 hPa) temperature, downloaded on 25 January 2019, and SST downloaded on 30 January 2019, with a spatial (zonal and meridional) resolution of .
The JRA55-do (https://esgf-node.llnl.gov/search/input4mips/) extends from 1958 to 2018 and is expected to be updated annually (around April each year). The SST field has a spatial resolution of and a temporal resolution of 1 d. Data used in the present work are the daily mean SST, downloaded on 8 November 2018, with a spatial (zonal and meridional) resolution of .
Data Preprocessing.
For each calendar year since 1984 (the first 5 y from 1979 to 1983 of the datasets ERA-Interim, ERA5, and ERA5 SST are used to calculate the first anomaly value for 1984), at each grid point in the Niño 3.4 region we calculate the anomalies by substracting the climatological average from the actual temperature and then dividing by the climatological SD. We do this for each calendar day . For simplicity, leap days were excluded. The calculations of the climatological average and SD are based only on the past data up to the year .
SysSampEn for an Arbitrary Complex System.
We first define the SysSampEn for an arbitrary system. Let us assume we have interdependent time series of length composing the system.
-
1)
From each time series, we select subrecords of length , starting at each -th data point, that is, starting at , as long as . Thus a specific subsequence is . Then we select subsequences from each time series and construct a set of template vectors from the system, that is, . We assume that 2 vectors are close (similar) if their Euclidean distance (if , then ), where and are the SDs of time series and , respectively. defines the similarity criterion and is a nonzero constant.
-
2)
To examine the probability that 2 time series which are close at data points still will be close at the next data points, we construct analogously another set by selecting subrecords of length . To make the number of template vectors of length equal to that of length , we choose . In order to reduce the parameter degrees of freedom and save calculation time, we take , then . We assume that 2 template vectors from the set are close if (if , then ).
-
3)
The SysSampEn of the system is defined as , where is the number of close vector pairs from the set , is the number of close vector pairs from the set and is the number of days since 1 January of each calendar year, used in the calculation of SysSampEn.
Parameter Determination for the SysSampEn.
Here we demonstrate how to determine the parameters for our Niño 3.4 climate system by using the ERA-Interim data. For each calendar year, we define a system composed of (red circles in Fig. 1) temperature anomalies time series of length d.
-
1)
We choose the vector lengths to be 30 d or 60 d, and the length increases to be 15 d or 30 d. We focus on a (bi)monthly timescale since El Niño is an interannual phenomenon.
-
2)
The purpose of the SysSampEn is to quantify the spatial and temporal disorder of a given system. This entails that if we have a spatially and temporally correlated complex system, represented by time series, and add random terms (e.g., white noise) to each time series, then the SysSampEn of the new system should be, with high probability, larger than the original SysSampEn. Similarly, if we replace the times series in a spatially highly correlated system with unrelated time series, the SysSampEn should increase. We use these properties as the basis of 2 tests to determine, for given and , the values of and , which enable a reliable discrimination between more and less ordered systems. For simplicity, we assume to be an integer.
-
a)Spatial asynchrony test: For a randomly selected year, we choose randomly 3 neighboring points on the globe , , and . To construct a highly coupled system, we randomly choose times 1 of these 3 nodes. Thus we obtain a system with 22 nodes, where , , and might be present in the system with different frequencies. To contrast, we choose randomly 22 unrelated nodes from the globe to create a system . We perform this procedure times. The accuracy is defined as
where[2]
In the present study, we used . The accuracy is shown as a function of and for in SI Appendix, Fig. S3A.[3] -
b)
Temporal disorder test: We compare the SysSampEn of an undisturbed climate system G1, here our Niño 3.4 system, with a new system , where random numbers have been added to the original time series. The new system is composed of time series . The are uncorrelated sequences of independent and uniform random numbers in the range . Here, is the average of the individual time series’ SDs between 1 January 1984 and 31 December 2018. We perform this procedure times. The accuracy is defined as in Eq. 2 and is shown as a function of and for in SI Appendix, Fig. S3B.
Forecasting Algorithm for El Niño Onsets.
We forecast the onset of an El Niño event in the following year if the forecasting index (average forecast over the 4 datasets) is C and the observed ONI in December is below C. Otherwise, we forecast the absence of an El Niño onset. The forecasting index is shown in Fig. 3A as the heights of rectangles.
Note that the forecasting index used in the present work is calculated based on the significant linear relationship between SysSampEn and the magnitudes of El Niño events that occurred in the period 1984 to 2017. To forecast the occurrence or absence of El Niño onsets after 2018, one should keep updating the forecasting index once a new El Niño has terminated, by choosing the function of the SysSampEn which gives the highest correlation with the magnitudes of all terminated El Niño events.
Forecasting Algorithm for El Niño Magnitudes.
To forecast the magnitude of an El Niño event starting in the year ,
-
1)
For one dataset, we determine the parameters of SysSampEn by using the ones that give the highest correlation with the magnitudes of the El Niño events that occurred before the forecasted event . We regard only parameter combinations which can provide a high accuracy level.
-
2)
For one dataset, we calculate the best fitting line between the El Niño magnitude and the previous calendar year’s SysSampEn, by using least-squares regression. Here stands for the magnitudes of the El Niño events, and stands for the corresponding previous year’s SysSampEn. Only past events of the forecasted event are used in the calculation of the best-fitting line.
-
3)
We calculate the SysSampEn in the year and substitute it into the function of the best-fitting line. Then we obtain the expected magnitude of El Niño event starting in the calendar year .
-
4)
Repeat steps 1 and 2 for the other datasets. The forecasted magnitude (blue dots in Fig. 3C) is obtained by taking the average of the 4 expected magnitudes (+ symbols in Fig. 3C).
-
5)
To determine the error bar of our forecasting, we perform the following leave-one-out hindcasts for each of the past events of the forecasted El Niño event :
-
a)
The same as 1.
-
b)
To obtain the leave-one-out hindcasted magnitude of each past event , we use all events that occurred before except for the hindcasted one to calculate the best-fitting line.
-
c)
We calculate the SysSampEn in the year , and substitute it into the function of the best-fitting line. Then we obtain the expected magnitude of the El Niño event starting in the calendar year .
-
d)
Repeat steps 1 and 2 for the other datasets. The leave-one-out hindcasted (orange dots in Fig. 3B) is obtained by taking the average of the 4 expected magnitudes (+ symbols in Fig. 3A).
-
a)
To forecast the magnitude of the 2018 El Niño event, we substitute for each dataset the SysSampEn value for the year 2017 into the corresponding best fitting linear function, which is determined by all of the past El Niño events (except for the 2009 event). Thus we have 4 individual forecasts, which we average to obtain our final forecast.
Data Availability.
The data/reanalysis that support the findings of this study are publicly available online: ERA-Interim (49), https://apps.ecmwf.int/datasets/; ERA5 (50), https://climate.copernicus.eu/climate-reanalysis; and JRA55-do (51), https://esgf-node.llnl.gov/search/input4mips/.
Supplementary Material
Acknowledgments
We thank M. J. McPhaden, S. Havlin, Y. Ashkenazy, and N. Marwan for their helpful suggestions; and the East Africa Peru India Climate Capacities project, which is part of the International Climate Initiative. The Federal Ministry for the Environment, Nature Conservation and Nuclear Safety supports this initiative on the basis of a decision adopted by the German Bundestag. The Potsdam Institute for Climate Impact Research is leading the execution of the project together with its project partners The Energy and Resources Institute and the Deutscher Wetterdienst.
Footnotes
The authors declare no competing interest.
This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1917007117/-/DCSupplemental.
References
- 1.Dijkstra H. A., Nonlinear Physical Oceanography: A Dynamical Systems Approach to the Large Scale Ocean Circulation and El Niño (Atmospheric and Oceanographic Sciences Library, Springer, ed. 2, 2005). [Google Scholar]
- 2.McPhaden M. J., Zebiak S. E., Glantz M. H., ENSO as an integrating concept in Earth science. Science 314, 1740–1745 (2006). [DOI] [PubMed] [Google Scholar]
- 3.Clarke A. J., An Introduction to the Dynamics of El Nino & the Southern Oscillation (Academic Press, Cambridge, 2008). [Google Scholar]
- 4.Sarachik E. S., Cane M. A., The El Niño–Southern Oscillation Phenomenon (Cambridge University Press, Cambridge, 2010). [Google Scholar]
- 5.NOAA , Oceanic Niño Index. https://esrl.noaa.gov/psd/data/correlation/oni.data. Accessed 20 March 2019.
- 6.Ropelewski C. F., Halpert M. S., Global and regional scale precipitation patterns associated with the el niño/southern oscillation. Mon. Weather Rev. 115, 1606–1626 (1987). [Google Scholar]
- 7.Kiladis G. N., Diaz H. F., Global climatic anomalies associated with extremes in the southern oscillation. J. Clim. 2, 1069–1090 (1989). [Google Scholar]
- 8.Halpert M. S., Ropelewski C. F., Surface temperature patterns associated with the southern oscillation. J. Clim. 5, 577–593 (1992). [Google Scholar]
- 9.Diaz H. F., Hoerling M. P., Eischeid J. K., ENSO variability, teleconnections and climate change. Int. J. Climatol. 21, 1845–1862 (2001). [Google Scholar]
- 10.Kumar K. K., Rajagopalan B., Hoerling M., Bates G., Cane M., Unraveling the mystery of Indian monsoon failure during El Niño. Science 314, 115–119 (2006). [DOI] [PubMed] [Google Scholar]
- 11.Hsiang S. M., Meng K. C., Cane M. A., Civil conflicts are associated with the global climate. Nature 476, 438–441 (2011). [DOI] [PubMed] [Google Scholar]
- 12.Burke M., Gong E., Jones K., Income shocks and HIV in Africa. Econ. J. 125, 1157–1189 (2015). [Google Scholar]
- 13.Schleussner C. F., Donges J. F., Donner R. V., Schellnhuber H. J., Armed-conflict risks enhanced by climate-related disasters in ethnically fractionalized countries. Proc. Natl. Acad. Sci. U.S.A. 113, 9216–9221 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Fan J., Meng J., Ashkenazy Y., Havlin S., Schellnhuber H. J., Network analysis reveals strongly localized impacts of El Niño. Proc. Natl. Acad. Sci. U.S.A. 114, 7543–7548 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Zebiak S. E., Cane M. A., A model El Niño–Southern Oscillation. Mon. Weather Rev. 115, 2262–2278 (1987). [Google Scholar]
- 16.McCreary J. P., Anderson D. L. T., An overview of coupled ocean-atmosphere models of El Niño and the Southern Oscillation. J. Geophys. Res.: Oceans 96, 3125–3150 (1991). [Google Scholar]
- 17.Kleeman R., On the dependence of hindcast skill on ocean thermodynamics in a coupled ocean-atmosphere model. J. Clim. 6, 2012–2033 (1993). [Google Scholar]
- 18.Kleeman R., Moore A. M., Smith N. R., Assimilation of subsurface thermal data into a simple ocean model for the initialization of an intermediate tropical coupled ocean-atmosphere forecast model. Mon. Weather Rev. 123, 3103–3113 (1995). [Google Scholar]
- 19.Wang B., Fang Z., Chaotic oscillations of tropical climate: A dynamic system theory for ENSO. J. Atmos. Sci. 53, 2786–2802 (1996). [Google Scholar]
- 20.Jin F. F., An equatorial ocean recharge paradigm for ENSO. Part I: Conceptual model. J. Atmos. Sci. 54, 811–829 (1997). [Google Scholar]
- 21.Jin F. F., An equatorial ocean recharge paradigm for ENSO. Part II: A stripped-down coupled model. J. Atmos. Sci. 54, 830–847 (1997). [Google Scholar]
- 22.Wang B., Barcilon A., Fang Z., Stochastic dynamics of el niño–southern oscillation. J. Atmos. Sci. 56, 5–23 (1999). [Google Scholar]
- 23.Palmer T. N., et al. , Development of a european multimodel ensemble system for seasonal-to-interannual prediction (demeter). Bull. Am. Meteorol. Soc. 85, 853–872 (2004). [Google Scholar]
- 24.Saha S., et al. , The NCEP climate forecast system. J. Clim. 19, 3483–3517 (2006). [Google Scholar]
- 25.Xu J. S., Von Storch H., Predicting the state of the southern oscillation using principal oscillation pattern analysis. J. Clim. 3, 1316–1329 (1990). [Google Scholar]
- 26.Penland C., Magorian T., Prediction of niño 3 sea surface temperatures using linear inverse modeling. J. Clim. 6, 1067–1076 (1993). [Google Scholar]
- 27.Kirtman B. P., et al. , Current status of ENSO forecast skill (Tech. Rep.56, CLIVAR Working Group on Seasonal to Interannual Prediction, 2001), p. 26. [Google Scholar]
- 28.Chen D., Cane M. A., El Niño prediction and predictability. J. Comput. Phys. 227, 3625–3640 (2008). [Google Scholar]
- 29.Gavrilov A., et al. , Linear dynamical modes as new variables for data-driven ENSO forecast. Clim. Dyn. 52, 2199–2216 (2019). [Google Scholar]
- 30.Webster P. J., Yang S., Monsoon and enso: Selectively interactive systems. Q. J. R. Meteorol. Soc. 118, 877–926 (1992). [Google Scholar]
- 31.Lau K. M., Yang S., The Asian monsoon and predictability of the tropical ocean–atmosphere system. Q. J. R. Meteorol. Soc. 122, 945–957 (1996). [Google Scholar]
- 32.McPhaden M. J., Tropical Pacific Ocean heat content variations and ENSO persistence barriers. Geophys. Res. Lett. 30, 33-1–33-4 (2003). [Google Scholar]
- 33.McPhaden M. J., A 21st century shift in the relationship between ENSO SST and warm water volume anomalies. Geophys. Res. Lett. 39, L09706 (2012). [Google Scholar]
- 34.Ludescher J., et al. , Improved El Niño forecasting by cooperativity detection. Proc. Natl. Acad. Sci. U.S.A. 110, 11742–11745 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Meng J., Fan J., Ashkenazy Y., Havlin S., Percolation framework to describe El Niño conditions. Chaos 27, 035807 (2017). [DOI] [PubMed] [Google Scholar]
- 36.Meng J., Fan J., Ashkenazy Y., Bunde A., Havlin S., Forecasting the magnitude and onset of El Niño based on climate network. New J. Phys. 20, 043036 (2018). [Google Scholar]
- 37.Nooteboom P. D., Feng Q. Y., López C., Hernández-García E., Dijkstra H. A., Using network theory and machine learning to predict El Niño. Earth System Dynamics 9, 969–983 (2018). [Google Scholar]
- 38.Hughes T. P., et al. , Coral reefs in the anthropocene. Nature 546, 82–90 (2017). [DOI] [PubMed] [Google Scholar]
- 39.Barnston A. G., Tippett M. K., L’Heureux M. L., Li S., DeWitt D. G., Skill of real-time seasonal ENSO model predictions during 2002–11: Is our capability increasing?. Bull. Am. Meteorol. Soc. 93, 631–651 (2011). [Google Scholar]
- 40.Wang-Chun Lai A., Herzog M., Graf H. F., ENSO forecasts near the spring predictability barrier and possible reasons for the recently reduced predictability. J. Clim. 31, 815–838 (2017). [Google Scholar]
- 41.Richman J. S., Moorman J. R., Physiological time-series analysis using approximate entropy and sample entropy. Am. J. Physiol. Heart Circ. Physiol. 278, H2039–2049 (2000). [DOI] [PubMed] [Google Scholar]
- 42.Pincus S., Approximate entropy (ApEn) as a complexity measure. Chaos 5, 110–117 (1995). [DOI] [PubMed] [Google Scholar]
- 43.Pincus S. M., Quantifying complexity and regularity of neurobiological systems. Methods Neurosci. 28, 336–363 (1995). [Google Scholar]
- 44.Kolmogorov A. N., New metric invariant of transitive dynamical systems and automorphisms of lebesgue spaces. Dokl. Akad. Nauk SSSR 119, 861–864 (1958). [Google Scholar]
- 45.Pincus S., Singer B. H., Randomness and degrees of irregularity. Proc. Natl. Acad. Sci. U.S.A. 93, 2083–2088 (1996). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Lake D. E., Richman J. S., Griffin M. P., Moorman J. R., Sample entropy analysis of neonatal heart rate variability. Am. J. Physiol. Regul. Integr. Comp. Physiol. 283, R789–797 (2002). [DOI] [PubMed] [Google Scholar]
- 47.Acharya U. R., et al. , Automated diagnosis of epileptic EEG using entropies. Biomed. Signal Process. Control 7, 401–408 (2012). [Google Scholar]
- 48.Ramdani S., Seigle B., Lagarde J., Bouchara F., Bernard P. L., On the use of sample entropy to analyze human postural sway data. Med. Eng. Phys. 31, 1023–1031 (2009). [DOI] [PubMed] [Google Scholar]
- 49.Dee D. P., et al. , The ERA-interim reanalysis: Configuration and performance of the data assimilation system. Q. J. R. Meteorol. Soc. 137, 553–597 (2011). [Google Scholar]
- 50.Copernicus Climate Change Service (C3S) , ERA5: Fifth generation of ECMWF atmospheric reanalyses of the global climate.Copernicus Climate Change Service Climate Data Store (CDS). (2017) https://cds.climate.copernicus.eu/cdsapp#/home. Accessed 25 January 2019.
- 51.Tsujino H., et al. , JRA-55 based surface dataset for driving ocean–sea-ice models (JRA55-do). Ocean Model. 130, 79–139 (2018). [Google Scholar]
- 52.Thorpe S. A., An Introduction to Ocean Turbulence (Cambridge University Press, Cambridge, 2007). [Google Scholar]
- 53.Schiermeier Q., Hunting the Godzilla El Niño. Nature 526, 490–491 (2015). [DOI] [PubMed] [Google Scholar]
- 54.Gnanadesikan A. O., Russell A. O., Pradal M. A. O., Abernathey R. O., Impact of lateral mixing in the ocean on el nino in a suite of fully coupled climate models. J. Adv. Model. Earth Syst. 9, 2493–2513 (2017). [Google Scholar]
- 55.Busecke J. J. M., Abernathey R. P., Ocean mesoscale mixing linked to climate variability. Sci. Adv. 5, eaav5014 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Lu Z., Yuan N., Fu Z., Percolation phase transition of surface air temperature networks under attacks of El Niño/La Niña. Sci. Rep. 6, 26779 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Lu Z., Fu Z., Hua L., Yuan N., Chen L., Evaluation of ENSO simulations in cmip5 models: A new perspective based on percolation phase transition in complex networks. Sci. Rep. 8, 14912 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Lai Y. C., Grebogi C., Quasiperiodicity and suppression of multistability in nonlinear dynamical systems. Eur. Phys. J. Spec. Top. 226, 1703–1719 (2017). [Google Scholar]
- 59.Wang W. X., Lai Y. C., Grebogi C., Data based identification and prediction of nonlinear and complex dynamical systems. Phys. Rep. 644, 1–76 (2016). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data/reanalysis that support the findings of this study are publicly available online: ERA-Interim (49), https://apps.ecmwf.int/datasets/; ERA5 (50), https://climate.copernicus.eu/climate-reanalysis; and JRA55-do (51), https://esgf-node.llnl.gov/search/input4mips/.



