Skip to main content
. Author manuscript; available in PMC: 2024 Dec 1.
Published in final edited form as: Epilepsia. 2022 Jul 1;64(Suppl 4):S78–S98. doi: 10.1111/epi.17311

Table 3:

Metrics for forecast performance.

Terminology Definition Formula
General definitions Forecast horizon The future period of time for which a forecast is generated.
Uninformative forecasts Forecasts that do not help decision-making. Trivial solutions, such as perpetually issuing 0% probability for rare events, have good performance but are uninformative (unskilled) and can be used as a reference.
Discrimination Discrimination measures whether forecasts differ when their corresponding observations differ; for example, if forecasts for days that are wet indicate more rain than for days that are dry, the forecasts can discriminate wetter from drier days.
Deterministic metrics Accuracy Measure of discrimination or how well a forecast correctly identifies or excludes a certain outcome. TP+TNAll
Sensitivity (Se) How often the forecast correctly identifies an event. TPTP+FN
Specificity (Sp) How often the forecast avoids misidentification. TNTN+FP
Time in warning (Tiw) Duration of time a forecast indicates an event is likely. TP+FPAll
Area under the curve (AUC) Typically assessed as the tradeoff between sensitivity and specificity (or time in warning) by systematically thresholding the algorithm output at all forecasted values. Se vs. 1-Sp
or
Se vs. Tiw
Relative risk The ratio between the probability of an event in a category or state and the probability of this event in another category. TP/TP+FPFN/FN+TN
Probabilistic metrics Observed probability Frequency of events per unit of time observed in the data, ie their empirical probability. i=1noin
Expected probability Based on all previous observations, the frequency (probability) of events expected over long duration in the future. limni=1noin
Forecasted probability Probability of event forecasted for one time interval in the future fi
Calibration (or reliability) Agreement between forecasted probability and observed probability. Typically calculated by averaging n forecasts datapoints in m ranked bins (fk, e.g. average forecast between 0 and 10%) and calculating the corresponding observed event probability, ok. For a calibrated forecast, the binned forecasted probability and observed probability match and therefore align on a diagonal in a reliability diagram. Graphically, distance to the diagonal (Fig. S1). 1nk=1mnk(fkok)2
Resolution Ability of the forecast to separate observed probabilities from the average observed probability. Resolution is zero for a flat line intersecting the y-axis at the expected probability, this corresponds to alignment of the ROC curve with the diagonal. Graphically, separation of the reliability curve from the horizontal line of no resolution (Fig. S1). 1nk=1mnk(oko)2
Sharpness Tendency to forecast probabilities, fi, near 0 or 1, as opposed to uniformly distributed forecasts. Sharpness is an attribute belonging only to the forecast and is not influenced by the observations. Graphically, variance of the distribution of the forecasts. 1ni=1n(fif)
Uncertainty Uncertainty only depends of the frequency of events o and is not influenced by the forecast. Uncertainty tends to 0 with very rare (or frequent) observations (ie with increased imbalance) and is greatest (=0.25) when an event is observed 50% of the time, making forecasts more difficult. o(1o)
Skill Accuracy of a forecast relative to some reference forecast. The reference forecast is generally an unskilled forecast such as random chance, shuffled forecasts, or uninformative forecasts. A forecast may be better simply because it is easier to make, which is taken into account when calculating Skill. 1ScoreScoreref
Bias Mismatch between the mean forecast value, f, and mean observed probability, o. fo
Brier score (BS) Mean squared distance between the forecasted value, fi, and the observation, oi (set to 1 or 0), calculated at each ith timepoint for n forecasts. Better Brier scores are lower (ie tend to zero). 1ni=1nfioi2
Brier skill score (BSS) Improvement of Brier score over a reference forecast. Brier skill scores tend to 1 when better, 0 when no improvement over reference, and when worse than reference. 1BSBSref

TP: true positive, TN: true negative, FP: false positive, FN: false negative, All: TP + TN + FP + FN. m, number of bins in the reliability diagram; n, number of data points (observed or forecasted); fi, forecast probability for the ith forecast; oi the ith observed probability; and the average observed probability.