Table 4. Comparison with existing literature (previously published models were trained on shot event only).
Research | Model | ROC-AUC | Evaluation Criteria |
---|---|---|---|
Eggels (2016) [10] | Random Forest | 0.775 | Average cross-validation performance with event-log+player-tracking data from 5020 professional games |
Decision Trees | 0.777 | ||
Logistic Regression | 0.785 | ||
ADA Boost | 0.692 | ||
Anzer and Bauer (2021) [21] | Random Forest | 0.794 | Unseen 20% a test data split from an event-log data set consist of 105,207 shots in the German Bundesliga |
GBM | 0.822 | ||
Logistic Regression | 0.807 | ||
ADA Boost | 0.817 | ||
Haaren (2021) [22] | Boosting Machine | 0.793 | Unseen event-log test data consist of 38,737 shots in European football leagues |
Mead (2023) [11] | XGBoost | 0.800 | 30% test data split from WYSCOUT data set [44, 45] |
The present work | Average performance from 100 rounds of five-fold cross-validation + average performance with unseen unbalanced and balanced testing data from 500 models developed | ||
Validation Data | Random Forest (3 events) 1 | 0.833 | |
Unbalanced testing | Random Forest (4 events) 2 | 0.827 | |
Balanced testing | Random Forest (4 events) 3 | 0.826 |
1 The highest mean ROC-AUC value with validation data was obtained when shot event information and two preceding events information (3 events) were included (0 ≤ n ≤ 2 events)
2 The highest mean ROC-AUC value with unseen unbalanced testing data was obtained when shot event information and three preceding events information (4 events) were included (0 ≤ n ≤ 3 events)
3 The highest mean ROC-AUC value with unseen balanced testing data was obtained when shot event information and three preceding events information (4 events) were included (0 ≤ n ≤ 3 events)