Skip to main content
. 2024 Oct 30;19(10):e0312278. doi: 10.1371/journal.pone.0312278

Table 4. Comparison with existing literature (previously published models were trained on shot event only).

Research Model ROC-AUC Evaluation Criteria
Eggels (2016) [10] Random Forest 0.775 Average cross-validation performance with event-log+player-tracking data from 5020 professional games
Decision Trees 0.777
Logistic Regression 0.785
ADA Boost 0.692
Anzer and Bauer (2021) [21] Random Forest 0.794 Unseen 20% a test data split from an event-log data set consist of 105,207 shots in the German Bundesliga
GBM 0.822
Logistic Regression 0.807
ADA Boost 0.817
Haaren (2021) [22] Boosting Machine 0.793 Unseen event-log test data consist of 38,737 shots in European football leagues
Mead (2023) [11] XGBoost 0.800 30% test data split from WYSCOUT data set [44, 45]
The present work Average performance from 100 rounds of five-fold cross-validation + average performance with unseen unbalanced and balanced testing data from 500 models developed
Validation Data Random Forest (3 events) 1 0.833
Unbalanced testing Random Forest (4 events) 2 0.827
Balanced testing Random Forest (4 events) 3 0.826

1 The highest mean ROC-AUC value with validation data was obtained when shot event information and two preceding events information (3 events) were included (0 ≤ n ≤ 2 events)

2 The highest mean ROC-AUC value with unseen unbalanced testing data was obtained when shot event information and three preceding events information (4 events) were included (0 ≤ n ≤ 3 events)

3 The highest mean ROC-AUC value with unseen balanced testing data was obtained when shot event information and three preceding events information (4 events) were included (0 ≤ n ≤ 3 events)