Quantifying congestion with player tracking data in Australian football

Jeremy P Alexander; Karl B Jackson; Timothy Bedin; Matthew A Gloster; Sam Robertson

doi:10.1371/journal.pone.0272657

. 2022 Aug 8;17(8):e0272657. doi: 10.1371/journal.pone.0272657

Quantifying congestion with player tracking data in Australian football

Jeremy P Alexander ^1,^*,^#, Karl B Jackson ^2,^#, Timothy Bedin ^2,^‡, Matthew A Gloster ^2,^‡, Sam Robertson ^1,^#

Editor: Gábor Vattay³

PMCID: PMC9359552 PMID: 35939497

Abstract

With 36 players on the field, congestion in Australian football is an important consideration in identifying passing capacity, assessing fan enjoyment, and evaluating the effect of rule changes. However, no current method of objectively measuring congestion has been reported. This study developed two methods to measure congestion in Australian football. The first continuously determined the number of players situated within various regions of density at successive time intervals during a match using density-based clustering to group players as ‘primary’, ‘secondary’, or ‘outside’. The second method aimed to classify the level of congestion a player experiences (high, nearby, or low) when disposing of the ball using the Random Forest algorithm. Both approaches were developed using data from the 2019 and 2021 Australian Football League (AFL) regular seasons, considering contextual variables, such as field position and quarter. Player tracking data and match event data from professional male players were collected from 56 matches performed at a single stadium. The random forest model correctly classified disposals in high congestion (0.89 precision, 0.86 recall, 0.96 AUC) and low congestion (0.98 precision, 0.86 recall, 0.96 AUC) at a higher rate compared to disposals nearby congestion (0.72 precision, 0.88 recall, 0.88 AUC). Overall, both approaches enable a more efficient method to quantify the characteristics of congestion more effectively, thereby eliminating manual input from human coders and allowing for a future comparison between additional contextual variables, such as, seasons, rounds, and teams.

Introduction

Australian football (AF) is a popular invasion sport played with two teams of 18 players on the field at any one time [1, 2]. Matches are divided into four quarters, each with 20 minutes of playing time [3]. The premier competition is the Australian Football League (AFL), which currently consists of 18 teams located across Australia [1]. Match-play has experienced a continual state of evolution, with improved player athleticism and professionalism, rule changes, innovative coaching tactics, and specialised training regimes all contributing to a faster game speed [4, 5]. Contemporary AF has been described as largely defensive, whereby a combination of an increased number of tackles, contested possessions, stoppages in play, and a decrease in scoring and effective disposal rates, have been associated with greater player density around the ball [4, 6].

Given the potential negative implications this style of play may have on viewership and participation, various rule changes have been continually introduced by the AFL, with the intention of either negating or arresting the abovementioned trends [6]. Some recent modifications include enforcing an even spread of players across the field when a quarter begins or after a goal is scored and restricting opposition movement after a player marks the ball, to permit less restricted ball movement [7]. The introduction of these rule changes have typically aimed to reduce congestion and promote scoring by diminishing defensive strategies and stimulating more attacking styles of play through a more continuous free-flowing game [6, 7].

Preliminary investigations involving the evolution of match-play in AF promoted the notion that the speed of the game was increasing, which was estimated by measuring the average velocity of the ball in m·s −1 [5]. Specifically, game speed almost doubled in the Victorian Football League (VFL) and AFL, between the 1961 and 1997 seasons [5]. The overall trend of faster game speed continued until 2007, followed by a plateau, before a decrease through until 2015 [5]. This finding may correspond with teams allocating more emphasis on defensive actions and intentionally increasing player density around the ball, thereby increasing congestion [6, 8]. Consequently, opposition ball movement may be restricted by limiting the time and space afforded to opposing players [4, 5]. Congestion in AF has typically been inferred via video analysis, using a human coder to count the number of players within a five-metre radius of the ball at 15 s intervals [5]. It was revealed that congestion steadily increased through 2015, with 28.6% of time in-play witnessing at least five players being recorded within five metres of the ball, up nearly double since 2006 [5].

Nonetheless, a metric that provides a reliable and continuously measured description of congestion remains absent. Previous methods have been laborious, inefficient, and prone to error. In addition, an intermittent recording of the player count within a pre-determined region is inadequate to determine the comparative degree of congestion a player is confronted with when executing skilled actions, such as disposing of the ball. Whilst raw player counts can be used to infer congestion, the designated number of players as the threshold is somewhat arbitrary, which may render it challenging to agree on a widely accepted definition.

Considering the above, a reliable and valid method that can be scaled in an efficient manner would be useful. With the advent of player tracking technologies, a suitable data source is available, whereby the location of teammates and opponents at each point in time can be processed with machine learning algorithms to quantify the characteristics of congestion more effectively. Therefore, the aim of this study was to develop two methods to measure congestion in AF. The first continuously determined the number of players situated within various regions of density at successive time intervals during a match using clustering. The second determined the level of congestion a player experiences with when disposing of the ball using a classification approach. Both approaches were used to compare congestion across common AF contextual variables, such as field position and quarter.

Materials and methods

Data collection

Ethical clearance was granted by the University Human Research Ethics Committee (application number HRE20-172). Data were collected from the 2019 and 2021 AFL regular seasons and pooled for all analyses. Matches played in 2020 were not included due to the season alterations that were implemented because of Covid-19. To ensure consistent tracking data and uniform field dimensions, matches (n = 56) were played at a single stadium (Marvel Stadium, Melbourne, Australia) where the field dimensions were 159.5 m x 128.8 m (length x width). Positional data in the form of Cartesian coordinates for each match were gathered using Catapult ClearSky 10 Hz local-positioning system (LPS) devices for all 44 participants (Catapult Sports, Melbourne, Australia). Teams were labelled Home Team and Away Team for each match to streamline data processing and visualisation. Matches were undertaken with four 20-min quarters (Q1, Q2, Q3, Q4) with breaks interspersed between periods. Tracking devices were housed in a sewn pocket in the jersey that is located on the upper back. Periods of play that lost the positioning of one or more players were omitted.

Data analysis

Match event data were recorded by trained human operators to the nearest tenth of a second (Champion Data, Pty Ltd., Melbourne, Australia). This data provides information regarding players executing skilled actions, such as kicks, handballs, marks, where disposals are the total number of kicks and handballs. Previous investigations have assessed the validity and reliability of similar match event data and reported very high levels (ICC range = 0.947–1.000) of agreement [9]. Movement data derived from tracking devices were also recorded to the nearest tenth of a second and were synchronised with match event data using the unix timestamps present in both datasets [10]. This combined dataset was used to infer the location of the ball, which was also specified to the nearest tenth of a second. Field position of the ball was separated into four zones (defensive 50; D50, defensive midfield; DM, attacking midfield; AM, forward 50; F50) by the two 50 m arcs and the centre of the ground, which is orthodox for AF research and statistical providers [11–13]. Periods where the ball was out of play, for example, when there was a break between quarters, when the umpire had the ball before a stoppage, and after scores were excluded from the investigation [14].

Continuous congestion during match play

The proposed concept of measuring continuous congestion is to differentiate between higher player density and lower player density at each successive time interval during a match. Clustering is ideally suited to this proposition due to the capacity of partitioning data into groups based on the similarities of their properties [15]. Depending on the state of play at any one time, most of players could be positioned in a single region of the ground, producing one large cluster of congestion, or separated in multiple distinct groups, generating several smaller clusters of congestion, or evenly spaced across a field of a play with no observable congestion. Consequently, is it necessary that any potential clustering technique involves a flexible mechanism that can manage a variable number of input groups, rather than a strict assignment of players to a pre-determined fixed number of groups.

Density-based clustering techniques, such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Ordering Points to Identify the Clustering Structure (OPTICS), satisfy the aforementioned impediments to quantify congestion. Specifically, these algorithms are established on reachability, whereby, data is clustered by identifying a minimum number of points within the neighbourhood of a certain radial threshold [16, 17]. Points that meet this criterion are considered ‘core points’, whilst data not meeting this criterion are referenced as ‘noise’ [18]. The OPTICS algorithm is preferable to DBSCAN as it has the capacity to detect meaningful clusters of varying density by ordering the representation such that points that are spatially nearest become neighbours and it only requires the minimum number of points as a mandatory input parameter [15]. As such, the OPTICS model was used to differentiate players within regions of higher density compared to those of lower density.

Analysis for continuous congestion during match play

OPTICS clustering model was performed using the scikit-learn library in python [19]. Players were clustered at each successive time interval during a match, whereby all 36 players were clustered as either core points or noise based on their respective location. To be considered a core point (core−distance), a player p had to be within a maximum radius ε of 7.5 m and contain a minimum number of three additional players (MinPts−distance) within its ε−neighbourhood N_ε(p). The core−distance of a player is the smallest distance between another player within its neighbourhood, meaning that player will become a core point if separate players are contained in its neighbourhood. Otherwise core−distance is UNDEFINED.

c o r e ‐ {d i s t a n c e}_{ε, M i n P t s} (p) = {\begin{matrix} U N D E F I N E D, i f | N_{ε} (p) | < M i n P t s \\ M i n P t s ‐ d i s t a n c e (p), o t h e r w i s e \end{matrix}}

The reachability−distance of another player o from p is either the distance between o and p, or the core−distance of p, whichever is larger.

r e a c h a b i l i t y ‐ {d i s t a n c e}_{ε, M i n P t s} (o, p) = {\begin{matrix} U N D E F I N E D, i f | N_{ε} (o) | < M i n P t s \\ \max (c o r e ‐ d i s t a n c e (o), d i s t a n c e (o, p)), o t h e r w i s e \end{matrix}}

Players were considered to be within congestion if identified as core points, while players were deemed outside congestion if recorded as noise. Clusters of congestion were originally assigned output labels of 0-n, while -1 was assigned to all points clustered as noise (see Fig 1) [20, 21]. These labels where converted to provide a practical description of congestion and to differentiate between separate clusters of congestion if more than one cluster was identified for a unique time interval (see Fig 1). Specifically, the cluster of congestion with the highest count of players was re-labelled ‘primary’ congestion, while remaining cluster(s) containing a lesser number of players were re-assigned as ‘secondary’ congestion. Finally, -1 was converted to ‘outside’ congestion.

Fig 1 — Each column of subplots **(A-B, C-D, E-F)** are pairs representing the same time interval. Upper panel **(A, C, E)** represents initial output labels from the clustering model. Lower panel **(B, D, F)** displays the corresponding labels after translating into primary, secondary, and outside labels.

The OPTICS model was iterated on each individual time point in every match to assess the proportion of players within each cluster for each unique point in time. Descriptive statistics (mean ± standard deviation) for the proportion of players within each cluster from each match was compared across field position (see Fig 2) and quarter (see Fig 3).

Fig 2 — D50, Defensive 50; DM, Defensive Midfield; AM, Attacking Midfield; F50, Forward 50.

Fig 3 — Q1, Quarter 1; Q2, Quarter 2; Q3, Quarter 3; Q4, Quarter 4.

Classifying level of congestion during disposals

Whilst the aforementioned clustering model is able to group players inside congestion compared to those outside congestion, it is unsuitable to categorise the level of congestion a player is confronted with when disposing of the ball. Simply, as the output is reduced to either ‘inside’ or ‘outside’ congestion, it disregards a more specific account of congestion that describes the level of congestion experienced by the ball-carrier. As such, a supervised machine learning algorithm is preferable for this task, whereby the level of congestion can be classified from an input vector of variables. Specifically, a ground truth label can be ascribed to a set of disposals, from which a classifier model attempts to predict the same labels using an optimal set of features and parameters [22]. To provide a ground truth of the level of congestion a player experiences when disposing of the ball, a training dataset was generated by professional analysts from Champion Data, whereby 1943 disposals were manually labelled using three categories outlined in Table 1.

Table 1. Definition of the level of congestion when disposing of the ball.

Disposal label	Description
High Congestion	Several players within 0–5 m of the ball-carrier
Nearby Congestion	Multiple players with 0–10 m of the ball-carrier but there is some space to make a decision
Low Congestion	There is one or no players within 10 m of the ball-carrier

Open in a new tab

Determining an appropriate set of features to train a given model depends on technical comprehension, prior knowledge of the problem, or the purpose of the analysis [23]. In consultation with the same match analysts from Champion Data, a range of spatiotemporal features (Table 2) were developed. These features were assessed for every disposal in the dataset, which delivered information from which a model could classify the aforementioned level of congestion.

Table 2. Definition of spatiotemporal features for disposal classification model.

Features	Description
Immediate Player Count (IPC)	Total count of all players within a 5 m radius of the ball-carrier
Extended Player Count (EPC)	Total count of all players within a 10 m radius of the ball-carrier
Immediate Defenders (IDC)	Total count of defenders within a 5 m radius of the ball-carrier
Extended Defenders (EDC)	Total count of defenders within a 10 m radius of the ball-carrier
Frontal Player Count (FPC)	Total count of players within frontal 90-degree quadrant of the ball-carrier
Right Player Count (RPC)	Total count of players within right 90-degree quadrant of the ball-carrier
Left Player Count (LPC)	Total count of players within left 90-degree quadrant of the ball-carrier
Behind Player Count (BPC)	Total count of players within behind 90-degree quadrant of the ball-carrier
Available space (AS)	Total area that intersects between radius that surrounds player/ball and the field of play

Open in a new tab

Analysis for classifying level of congestion during disposals

All analyses were performed using the Scikit-learn library in python. To select the appropriate classification model, base model testing was run using the lazypredict package, which is a repository of classifier algorithms [24]. The classifier that yielded the highest accuracy was the Random Forest (RF) algorithm. The RF classifier is a non-linear machine learning technique used for classification and regression, whereby an assembly of decision trees are used to calculate the mode of classes of individual trees and ranking of classifiers [25]. The RF classifier was used to assess the disposal labels (High Congestion, Nearby Congestion, Low Congestion) when referencing the spatiotemporal features.

The data were split into training and testing (80:20) datasets [26, 27]. The RF classifier’s hyperparameters were optimised using GridSearchCV in scikit-learn [28]. After fine-tuning for optimal performance, we selected the Gini Index for the criterion, the number of trees was fixed to 500, the maximum depth of each tree was set to 10, the minimum sample split was adjusted to 50, and the minimum sample leaf was set to 5. The feature importance scores were determined by the Gini Index, where feature extraction followed using the lowest contributing value out of each iteration until a decrease in model performance occurred [27], resulting in the removal of the ‘available space’ feature. The remaining 8 features were used for modelling.

To visualise and interpret the feature importance, the Shapley Additive exPlanations (SHAP) package was used [29]. This package displays the global importance of each feature for classifying the disposal label and the local explanation of each feature exhibiting the direction of the relationship between the feature and disposal label [29]. Model performance was assessed based on standard metrics including precision (ratio of true positives to predicted positives), recall (ratio of true positives to actual positives) and F1-Score (harmonic mean of precision and recall) [26]. The confusion matrix, ROC curves (receiving operating characteristics), and precision-recall (PR) curves were also examined to determine the performance of the model [30]. Although ROC curves are typically used for binary classification, they can be administered to multi-class classification by using the one vs all approach and considering the micro-average curve to analyse the overall performance of the classifier [31]. The ROC curve is generated by plotting the false positive rate against the true positive rate [30]. The area under the ROC curve (ROC-AUC) describes the capacity to distinguish between classes, with an ROC-AUC of 1.0 representing that the classifier can differentiate classes perfectly [32]. The area under the precision-recall curve (PR-AUC) is produced by plotting the recall against the precision, which provides an indication of the number of positives samples in a dataset [33].

The resulting RF classifier computed the label of every disposal in the dataset. Descriptive statistics (mean ± standard deviation) for each disposal label from each match determined the breakdown of the level of congestion during disposals, compared across field position and quarter.

Results

Continuous congestion during match play

Results from the OPTICS clustering model are presented in Figs 2 and 3. When assessing field position, the proportion of players in primary and secondary congestion was marginally greater in the D50 and F50 when compared to the DM and AM (Fig 2). Conversely, outside congestion witnessed a greater proportion of players in the AM and DM when compared to the D50 and F50. Finally, as the match progressed across each quarter, the proportion of players within primary congestion observed a minor decrease, while the segment of players outside congestion slightly increased (Fig 3).

Classifying level of congestion during disposals

Fig 4 displays the global feature importance and the local explanation summary exhibiting the direction of the relationship between each feature and disposal label. Immediate player count, extended player count, and immediate defender count were the most important features in classifying the disposal label. Evaluation of the RF classifier is presented in Figs 5 and 6, using metrics, confusion matrix, ROC-AUC and PR-AUC curves for each disposal label. Disposals within high congestion, 0.89 precision and 0.86 recall, and low congestion, 0.98 precision and 0.86 recall, were correctly classified at a considerably higher rate compared to disposals nearby congestion, 0.72 precision and 0.88 recall. The ROC-AUC of 0.96 for disposals in high congestion and 0.96 for low congestion were greater than 0.88 in disposals nearby congestion. Similarly, the PR-AUC of 0.9 and 0.96 for disposals in high congestion and low congestion, were greater than 0.69 for disposals nearby congestion.

Fig 5 — Evaluation metrics, including precision, recall, and F1-score assessing the performance of the RF model to classify each disposal label **(A)**. Confusion matrix of the RF model displaying correctly classified and misclassified disposal labels **(B)**.

Fig 6 — Evaluation of RF model to classify each disposal label expressed by ROC curves **(A)**, and PR curves **(B)**.

The breakdown of the level of congestion for each disposal label compared across field position and quarter are presented in Figs 7 and 8. When assessing field position (Fig 7), the proportion of disposals in high congestion increased as the ball transitioned from D50 to F50. Disposals nearby congestion increased from D50 to AM, while those outside congestion concurrently decreased. Conversely, progressing from the AM to F50 observed a decrease in disposals nearby congestion and an increase in disposals outside congestion. As a match proceeded across each quarter (Fig 8), disposals within high congestion steadily decreased, while disposals nearby congestion and outside congestion gradually increased.

Fig 7 — D50, Defensive 50; DM, Defensive Midfield; AM, Attacking Midfield; F50, Forward 50.

Fig 8 — Q1, Quarter 1; Q2, Quarter 2; Q3, Quarter 3; Q4, Quarter 4.

Discussion

This study developed two methods to measure congestion in AF. The first continuously determined the number of players situated within various regions of density at successive time intervals during a match, whilst the second classified the level of congestion a player experiences when disposing of the ball. This information provides a scalable method to quantify congestion during matches.

The first method showed that players are within a cluster of congestion (primary or secondary) between 23% and 26% of a typical game. Whilst an exact comparison to existing research is challenging, given the differences in methodology, previous studies report similar findings in AF [5]. Specifically, 28.6% of total time in-play witnessed at least five players within 5 m of the ball during 15 second intervals in the 2015 season. This finding was more than double that of the 2006 season, which recorded 11.2% of time in-play.

Whilst a continuous account of congestion provides an indication of players located within clusters of greater density across a field of play, it is unsuitable to quantify the level of congestion a player experiences when disposing of the ball. Specifically, the output description is limited to ‘inside’ or ‘outside’ congestion, thereby excluding a more nuanced or tiered description of congestion experienced by the ball-carrier. In addressing these limitations, the RF model was able to correctly classify disposals within high congestion and low congestion at a higher rate when compared to disposals nearby congestion. These findings may be attributed to the fluid nature of congestion, whereby certain disposals may span across two separate categories, rather than neatly fit into a single category. For example, a disposal classified as nearby congestion may contain characteristics that are similar to either high congestion or low congestion, as is evident when considering the false positives in the confusion matrix. Overall, the model eliminates the task of manually coding each unique label, thereby establishing a scalable method to quantify the level of congestion a player experiences when disposing of the ball.

Overall, more than 60% of disposals encountered high congestion or nearby congestion. This suggests that large segments of match-play experiences greater density around the ball-carrier, which may instil pressure and influence passing capacity. Previous research confirms comparable sentiments in AF, with effective disposal rates steadily declining since 2005, coinciding with a concomitant increase in the number of tackles [4, 34]. Disposals performed under low congestion decreased as teams transitioned the ball towards their attacking end. After a team obtains possession of the ball in the defensive half, the opposition may fold back in numbers to establish defensive stability, rather than press up the field to instigate a turnover in possession. Corresponding studies corroborate this tactical team behaviour in AF, whereby teams produced a numerical advantage in their defensive half [35]. Although disposals in low congestion decreased in the F50 compared to the AM, this is likely due to set shots on goal. Under this scenario, opposition players are prohibited from entering a designated space around the player, thereby allowing a shot at goal unimpeded by the opposing team. The level of congestion steadily declined across each quarter, which may suggest that as time in play elapses, players fatigue or increased scoring margins witness a decrease in intensity or effort as the match outcome is largely determined.

In response to a steadily declining scoring rate and a predominantly defensive game style, the AFL, guided by the Laws of the Game Charter, have continually implemented major rule changes to enhance fan enjoyment [6, 8]. Initially, rule changes involved capping interchange rotations and initiating quicker re-starts to play after a score or the ball going out of bounds [8]. More recently, teams are required to maintain even numbers in each section of the ground at the commencement of each quarter and after a goal is scored [7]. In addition, all opposing players are prohibited from entering a designated region surrounding a player that obtained a mark, whilst compelling the player standing the mark to remain stationary [7]. Ostensibly, such rule changes constrain the defending team’s ability to restrict ball transition, thereby allowing for more attacking ball movement for the offensive team, which may increase scoring [7]. However, scoring rates have remained stubbornly subdued [36], which suggests that the current rule changes may not be enticing teams to alter their collective movement behaviour. Indeed, previous studies support this sentiment in AF, whereby teams are preserving a numerical advantage in their defensive 50, which reduces the likelihood of generating a shot on goal [35]. The direct causes of reduced scoring rates are likely multifaceted and require further investigation.

The trend towards a greater emphasis on defensive strategies and lower scoring rates has been reported in other invasion sports including football, rugby league, and rugby union [37, 38]. A key component of this development may be increased player density and congestion [38]. Nonetheless, limited work has been undertaken in measuring player congestion, except for techniques that involve considerable human input. Both approaches developed in this study demonstrate a scalable method to quantify congestion during match-play that require minimal manual control. This information can be applied to various aspects of performance analysis in invasion sports, such as, evaluating the efficacy of training programmes, assessing the physical demands of sport performance, quantifying the value of passes, and informing expected goals metrics. Specifically, sport science practitioners may incorporate a more representative training design by targeting drills that replicate congestion witnessed in match-play. The frequency of passing in football has increased in recent years [37]. Successful teams also record greater possession rates and an increased number of passes per game compared to their losing counterparts [39]. As player proximity is a central component in skill execution and attaining a goal [40], incorporating congestion may provide an enhanced understanding when quantifying the value of passes and expected goals metrics.

Whilst the dataset included in this investigation accounts for 56 matches across multiple seasons, it was limited to matches played at a single stadium. The inclusion of additional data may identify a more nuanced representation of congestion and if any variations exist between teams, stadiums, and independent rounds. The machine learning models proposed in this study to quantify congestion were novel, which naturally specifies the parameters used to tune the algorithms were likewise exploratory. Although the models were thoroughly trailed and tested using various input parameters, a greater implementation from a broader range of experts may assist in ensuring the methodology is valid and reliable and if alterations need to be tailored for specific applications. Additional investigations may also determine how congestion located elsewhere on the field influences subsequent match events. Specifically, determining how greater congestion forward of the ball influences match event outcomes, such as retaining possession or scoring.

Conclusion

This study developed two methods for measuring congestion in AF. The clustering model identified that players were within a cluster of congestion between 23% and 26% of a typical game. The random forest model was able to correctly classify disposals in high congestion and low congestion at a higher rate compared to disposals nearby congestion. Both modelling approaches demonstrate a more efficient method to quantify the characteristics of congestion, thereby eliminating manual input. This information provides a scalable method to quantify congestion, which allows for the comparison between seasons, rounds, and teams and can be used to inform player training, team strategy and rule changes.

Supporting information

S1 File

(ZIP)

Click here for additional data file.^{(534.7KB, zip)}

Data Availability

All relevant data are within the manuscript and its Supporting Information files.

Funding Statement

The authors Jeremy Alexander (JA), Karl Jackson (KJ), Timothy Bedin (TB), and Matthew Gloster (MG), are part-time or full-time employees of Champion Data. The funder provided support in the form of salaries for authors JA, KJ, TB, and MG but did not have any additional role in the study design, statistical analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.

References

1.Gray AJ, Jenkins DG. Match analysis and the physiological demands of Australian football. Sports Medicine. 2010;40(4):347–60. doi: 10.2165/11531400-000000000-00000 [DOI] [PubMed] [Google Scholar]
2.Robertson S, Back N, Bartlett JD. Explaining match outcome in elite Australian Rules football using team performance indicators. Journal of Sports Sciences. 2016;34(7):637–44. doi: 10.1080/02640414.2015.1066026 [DOI] [PubMed] [Google Scholar]
3.Mason RJ, Farrow D, Hattie JA. An analysis of in-game feedback provided by coaches in an Australian Football League competition. Physical Education and Sport Pedagogy. 2020;25(5):464–77. [Google Scholar]
4.Woods CT, Robertson S, Collier NF. Evolution of game-play in the Australian Football League from 2001 to 2015. Journal of sports sciences. 2017;35(19):1879–87. doi: 10.1080/02640414.2016.1240879 [DOI] [PubMed] [Google Scholar]
5.Norton KI, Craig N, Olds T. The evolution of Australian football. Journal of Science and Medicine in Sport. 1999;2(4):389–404. doi: 10.1016/s1440-2440(99)80011-5 [DOI] [PubMed] [Google Scholar]
6.Lane JC, van der Ploeg G, Greenham G, Norton K. Characterisation of offensive and defensive game play trends in the Australian Football League (1999–2019). International Journal of Performance Analysis in Sport. 2020;20(4):557–68. [Google Scholar]
7.Bowen N. Know the new rules? 6-6-6, 50m penalties, kick-in rule explained afl.com.au: Australian Football League; 2019 [Available from: https://www.afl.com.au/news/121022/know-the-new-rules-6-6-6-50m-penalties-kick-in-rule-explained].
8.Norton KI, editor Evolution of rule changes and coaching tactics in Australian Football: impact on game speed, structure and injury patterns. Science and Football VII: The Proceedings of the Seventh World Congress on Science and Football; 2013; Routledge, Abingdon, Oxon. [Google Scholar]
9.Robertson S, Gupta R, McIntosh S. A method to assess the influence of individual player performance distribution on match outcome in team sports. Journal of Sports Sciences. 2016;34(19):1893–900. doi: 10.1080/02640414.2016.1142106 [DOI] [PubMed] [Google Scholar]
10.Spencer B, Jackson K, Bedin T, Robertson S. Modelling the quality of player passing decisions in Australian Rules football relative to risk, reward and commitment. Frontiers in Psychology. 2019;10:1777. doi: 10.3389/fpsyg.2019.01777 [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Jackson K. Assessing Player Performance in Australian Football Using Spatial Data. Melbourne: Swinburne University of Technology; 2016. [Google Scholar]
12.Vella A, Clarke AC, Kempton T, Ryan S, Holden J, Coutts AJ. Possession chain factors influence movement demands in elite Australian football match-play. Science and Medicine in Football. 2020:1–7. [DOI] [PubMed] [Google Scholar]
13.Taylor N, Gastin PB, Mills O, Tran J. Network analysis of kick-in possession chains in elite Australian football. Journal of Sports Sciences. 2020;38(9):1053–61. doi: 10.1080/02640414.2020.1740490 [DOI] [PubMed] [Google Scholar]
14.Alexander JP, Spencer B, Sweeting AJ, Mara JK, Robertson S. The influence of match phase and field position on collective team behaviour in Australian Rules football. Journal of Sports Sciences. 2019;37(15):1699–707. doi: 10.1080/02640414.2019.1586077 [DOI] [PubMed] [Google Scholar]
15.Agrawal K, Garg S, Sharma S, Patel P. Development and validation of OPTICS based spatio-temporal clustering technique. Information Sciences. 2016;369:388–401. [Google Scholar]
16.Kriegel HP, Kröger P, Sander J, Zimek A. Density‐based clustering. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2011;1(3):231–40. [Google Scholar]
17.Babichev S, Durnyak B, Pikh I, Senkivskyy V, editors. An evaluation of the objective clustering inductive technology effectiveness implemented using density-based and agglomerative hierarchical clustering algorithms. International Scientific Conference “Intellectual Systems of Decision Making and Problem of Computational Intelligence”; 2019: Springer.
18.Malzer C, Baum M, editors. A hybrid approach to hierarchical density-based cluster selection. 2020 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI); 2020: IEEE.
19.Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine learning in Python. the Journal of machine Learning research. 2011;12:2825–30. [Google Scholar]
20.Saha R. Homonym Identification using BERT—Using a Clustering Approach. arXiv preprint arXiv:210102398. 2021. [Google Scholar]
21.Zhu Y, Ting KM, Jin Y, Angelova M. Hierarchical clustering that takes advantage of both density-peak and density-connectivity. Information Systems. 2022;103:101871. [Google Scholar]
22.Carey DL, Ong K, Morris ME, Crow J, Crossley KM. Predicting ratings of perceived exertion in Australian football players: methods for live estimation. International Journal of Computer Science in Sport, 15 (2): 64. 2016;77. [Google Scholar]
23.Grira N, Crucianu M, Boujemaa N. Unsupervised and semi-supervised clustering: a brief survey. A review of machine learning techniques for processing multimedia content. 2004;1:9–16. [Google Scholar]
24.Barrionuevo GO, Ríos S, Williams SW, Ramos-Grez JA, editors. Comparative Evaluation of Machine Learning Regressors for the Layer Geometry Prediction in Wire arc Additive manufacturing. 2021 IEEE 12th International Conference on Mechanical and Intelligent Manufacturing Technologies (ICMIMT); 2021: IEEE.
25.Breiman L. Random forests. Machine learning. 2001;45(1):5–32. [Google Scholar]
26.Cust EE, Sweeting AJ, Ball K, Robertson S. Classification of Australian football kick types in-situation via ankle-mounted inertial measurement units. Journal of Sports Sciences. 2021:1–9. doi: 10.1080/02640414.2020.1868678 [DOI] [PubMed] [Google Scholar]
27.Whitehead S, Till K, Jones B, Beggs C, Dalton-Barron N, Weaving D. The use of technical-tactical and physical performance indicators to classify between levels of match-play in elite rugby league. Science and Medicine in Football. 2020:1–7. [DOI] [PubMed] [Google Scholar]
28.Bransen L, Van Haaren J, editors. Measuring football players’ on-the-ball contributions from passes during games. International workshop on machine learning and data mining for sports analytics; 2018: Springer.
29.Lundberg SM, Erion GG, Lee S-I. Consistent individualized feature attribution for tree ensembles. arXiv preprint arXiv:180203888. 2018. [Google Scholar]
30.Carter JV, Pan J, Rai SN, Galandiuk S. ROC-ing along: Evaluation and interpretation of receiver operating characteristic curves. Surgery. 2016;159(6):1638–45. doi: 10.1016/j.surg.2015.12.029 [DOI] [PubMed] [Google Scholar]
31.Baboota R, Kaur H. Predictive analysis and modelling football results using machine learning approach for English Premier League. International Journal of Forecasting. 2019;35(2):741–55. [Google Scholar]
32.Fawcett T. An introduction to ROC analysis. Pattern recognition letters. 2006;27(8):861–74. [Google Scholar]
33.Sofaer HR, Hoeting JA, Jarnevich CS. The area under the precision‐recall curve as a performance metric for rare binary events. Methods in Ecology and Evolution. 2019;10(4):565–77. [Google Scholar]
34.Johnston RJ, Watsford ML, Pine MJ, Spurrs RW, Murphy A, Pruyn EC. Movement demands and match performance in professional Australian football. International Journal of Sports Medicine. 2012;33(02):89–93. doi: 10.1055/s-0031-1287798 [DOI] [PubMed] [Google Scholar]
35.Alexander JP, Bedin T, Jackson KB, Robertson S. Team numerical advantage in Australian rules football: A missing piece of the scoring puzzle? Plos One. 2021;16(7):e0254591. doi: 10.1371/journal.pone.0254591 [DOI] [PMC free article] [PubMed] [Google Scholar]
36.AFL Tables 2021 [Available from: https://afltables.com/afl/seas/2021.html.
37.Wallace JL, Norton KI. Evolution of World Cup soccer final games 1966–2010: Game structure, speed and play patterns. Journal of Science and Medicine in Sport. 2014;17(2):223–8. doi: 10.1016/j.jsams.2013.03.016 [DOI] [PubMed] [Google Scholar]
38.Norton K. Match analysis in AFL, Soccer and Rugby Union: patterns, trends and similarities: Routledge; 2013. [Google Scholar]
39.Pappalardo L, Cintia P. Quantifying the relation between performance and success in soccer. Advances in Complex Systems. 2018;21(03n04):1750014. [Google Scholar]
40.Ensum J, Pollard R, Taylor S. Applications of logistic regression to shots at goal at association football: Calculation of shot probabilities, quantification of factors and player/team. Journal of Sports Sciences. 2004;22(6):500–20. [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0272657.r001

Decision Letter 0

Gábor Vattay

21 Apr 2022

PONE-D-22-01015Quantifying congestion with player tracking data in Australian FootballPLOS ONE

Dear Dr. Alexander,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jun 05 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Gábor Vattay, PhD, DSc

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please provide additional details regarding participant consent. In the ethics statement in the Methods and online submission information, please ensure that you have specified (1) whether consent was informed and (2) what type you obtained (for instance, written or verbal, and if verbal, how it was documented and witnessed). If your study included minors, state whether you obtained consent from parents or guardians. If the need for consent was waived by the ethics committee, please include this information

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Comments to the authors

General comments

Thanks for allowing me to review this manuscript. Overall, the study is of very high quality and the findings are applicable and relevant to those working within AFL. The two methods presented to quantify congestion are novel and highly innovative, requiring extensive analyses so I commend you all on that. A major strength of the study was the ability of the authors to explain the findings in simple, practical terms, useful for practitioners as the analyses would be unfamiliar to most readers. Although I have some understanding of machine learning, much of the analyses were unfamiliar to me, so I can’t comment too much about the appropriateness, however the figures were easy to interpret. The manuscript is very well written, and explains the concept of the study exceptionally, and the as mentioned the findings are explained well. My suggestions/comments are mostly related to some additional detail that I believe is necessary within the methods which are detailed below. Thanks again, and congrats on completing such a high quality manuscript.

Specific comments

Abstract

Line 40: Consider adding ‘method’ between second and aimed. To me it read like an aim of the study, when really you’re referring to the method

Introduction

Overall, the introduction is really well written and provides context to the reader about the purpose of the study

Line 75-76: Where you refer to the ‘speed’ of the game, you need a bit more detail around this. Specifically, are you referring to m/min (assuming so), but I can imagine many readers may assume speed as in velocity of running and perhaps high-speed running volumes. Provide descriptive data in line 77.

Methods

Line 109: Why was 2020 not included? Was this due to the reduced duration of games?

Line 112: Did you obtain data from all teams in the AFL or one team? How did you obtain this data if it involved multiple teams?

Line 113: The S5 devices aren’t LPS enabled, these devices would have been worn in the 2019 season, but Vector would have been used in 2020-2021. In 2019, Catapult Clearsky (LPS devices) were worn in Marvel stadium. Can you confirm this information, as well as detail/report any between-device information that’s necessary here.

Line 133: Please provide detail on your methods of determining player location data. I’m assuming you used GNSS lat/long data, but please include details on this within the methods

Figure 1: The quality of the figure seems quite poor in the PDF. I can see it’s a tif file, but perhaps check it in the next submission. Also, I wonder if it’s necessary to include the top panel here – for simplicity?

Line 193: State where that data is presented – Figure 2 and 3

Line 214: Did the same Champion Data staff label/code the disposals?

Line 233: Should the Shapely Additive exPlanations package be in italics like the others previously reported?

Results

Overall, the results explain the findings really well. My only suggestion would be with the figures to make them look a bit ‘cleaner’ and publication worthy. For example, removing the grey section around them, lighten (or remove) the gridlines, position the legend in a consistent spot. Also the colour scheme used across each figure varies quite a bit. I understand the colours are used to represent different things, but perhaps consider using a similar theme

Discussion

The discussion was well written, and explained the results in simple, applied/practical terms. This is really useful for readers, as the analysis is very complex, the findings need to be interpreted clearly for the translation of this study into practice.

Line 314: Perhaps link the first two paragraphs together.

Line 382: Can you state the reasoning for this in the methods

References

Ref 2, 14, 34, 35 journal title needs to be in capital format

Reviewer #2: The authors are to be commended for a well-written article. It is the first known article that proposes objective data analysis techniques to capture the congestion in Australian football. Both approaches seem to provide more effective information than current methods (i.e., manual input). This not only seems to be useful to evaluate congestion in AF, but also may be practical for many other collective sports-related phenomena. Generally, the topic falls within the scope of the journal and could be of potential interest to its readers.

However, the following minor concerns need to be addressed before publication:

Due to the large amount of abbreviations a list of abbreviations would be recommended, as long as journal’s guidelines approve it, to facilitate the reading.

Abstract:

Ln34: Australian instead of Australia

*Ln35: “is an important consideration in identifying passing capacity, assessing fan enjoyment, and evaluating the effect of rule changes”. You mention these aspects as important when studying congestion but they are not mentioned elsewhere in the article aside from the abstract. Why? In my opinion, they should be mentioned if you consider in the introduction or discussion, otherwise I would remove them from the abstract.

Ln54: Congestion is already in the title. I would suggest to switch it for another key word to avoid repeat it. In this way it may facilitate this article to be found from more searches.

Materials and methods:

As you may understand these definitions may be somehow relative to the individual and environmental constraints of the game. For example, for nearby congestion “multiple players with 0-10m of ball-carrier but there is some space to make a decision”: within this 10 m some experienced and fresh player probably will have time to make a clear decision, however, a novice and fatigued young player during a rainy day probably can feel high congestion in this space to make an “adequate” decision. I am not intended to change your definitions but to help future readers to consider different features for classifying the dynamic and nonlinear level of congestion, assessing previously the main constraints of the game (e.g., age of players, level of players, meteorology, etc.). In my opinion, this should be briefly considered as a practical implication (in discussion) for next studies to do not treat it as a universal and fixed rule.

Discussion:

Ln310: Why do you not use the abbreviation for Australian Football (AF) here?

*Ln315: Why do you use approximately? Is there no exact value for it?

Ln347: Very interesting finding. This reinforce previous articles finding how collective coordination dynamics decrease across the periods of the match, highly influenced by effort accumulation. See:

Duarte, R., Araújo, D., Folgado, H. et al. Capturing complex, non-linear team behaviours during competitive football performance. J Syst Sci Complex 26, 62–72 (2013). https://doi.org/10.1007/s11424-013-2290-3

This may suggest that your used data analysis techniques may be proposed as complementary methods of analysis to approach effort accumulation and acute fatigue effects in collective sports.

Also, your data analysis techniques may be applied to capture congestion not only in competition but also in training when simulating different environments similar to matches. For example, when manipulating player’s space of interaction (Ric et al., 2017) or temporary numerical imbalances (Cantón et al., 2019). Indeed, this study offers objective tools with highly applicability in collective sports.

Ric A, Torrents C, Gonçalves B, Torres-Ronda L, Sampaio J, Hristovski R (2017) Dynamics of tactical behaviour in association football when manipulating players' space of interaction. PLoS ONE 12(7): e0180773. https://doi.org/10.1371/journal.pone.0180773

Canton, A., Torrents, C., Ric, A., Gonçalves, B., Sampaio, J., & Hristovski, R. (2019). Effects of Temporary Numerical Imbalances on Collective Exploratory Behavior of Young and Professional Football Players. Frontiers in psychology, 10, 1968. https://doi.org/10.3389/fpsyg.2019.01968

Ln380: Moreover, this study provides data analysis techniques that take into account the coordination dynamics properties of teams. From a complex systems based-approach it seems more adequate than using isolated and timeless methods (Montull et al., 2022) to assess not only congestion but also, as I mentioned, other sport-related phenomena influencing the collective behaviour of teams as effort, match strategies, numerical imbalances, etc. In this sense, other methods of analysis based on coordination dynamic properties, such as Uncontrolled manifold to assess synergies or network analysis, may help to approach congestion and related phenomena in future research as well.

Montull, L., Slapšinskaitė-Dackevičienė, A., Kiely, J. et al. Integrative Proposals of Sports Monitoring: Subjective Outperforms Objective Monitoring. Sports Med - Open 8, 41 (2022). https://doi.org/10.1186/s40798-022-00432-z

Conclusion:

Ln391: “approximately 25%”, as I mentioned above, cannot be described with its exact value?

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Heidi Thornton

Reviewer #2: Yes: Lluc Montull

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Aug 8;17(8):e0272657. doi: 10.1371/journal.pone.0272657.r002

Author response to Decision Letter 0

29 Jun 2022

Reviewer #1: Comments to the authors

General comments

Specific comments

Abstract

Line 40: Consider adding ‘method’ between second and aimed. To me it read like an aim of the study, when really you’re referring to the method

Line 40 has been amended to include “method” in between second and aimed.

Introduction

Overall, the introduction is really well written and provides context to the reader about the purpose of the study.

Line 76-77 has been amended to “Preliminary investigations involving the evolution of match-play in AF promoted the notion that the speed of the game was increasing, which was estimated by measuring the average velocity of the ball in m/s.”

Methods

Line 109: Why was 2020 not included? Was this due to the reduced duration of games?

Yes, the shortened duration of game time made it challenging to compare across seasons. We have included reference to this in lines 111-112 “Matches played in 2020 were not included due to the season alterations that were implemented because of Covid-19.”

Line 112: Did you obtain data from all teams in the AFL or one team? How did you obtain this data if it involved multiple teams?

Champion Data are the official provider to the AFL and are involved in this study. They collect tracking data on all AFL teams via an existing arrangement. Spatiotemporal data was limited to games played at Marvel Stadium only as it’s the only location that offers LPS technology.

Yes, the authors have altered the device details to ClearSky. The authors have amended lines 115-117 to update the tracking device details and the corresponding data format (cartesian coordinates) “Positional data in the form of Cartesian coordinates for each match were gathered using Catapult ClearSky 10 Hz local-positioning system (LPS) devices for all 44 participants (Catapult Sports, Melbourne, Australia).” The authors do not have any information on between-device differences. Although, both the S5 and vector units utilise the same ClearSky (LPS) technology.

Line 133: Please provide detail on your methods of determining player location data. I’m assuming you used GNSS lat/long data, but please include details on this within the methods

Positional data was gathered using LPS technology located at Marvel stadium. Cartesian coordinates were used, which did not require a conversion. The authors have included more information regarding this in lines 115-117 to emphasise LPS technology and cartesian coordinates “Positional data in the form of Cartesian coordinates for each match were gathered using Catapult ClearSky 10 Hz local-positioning system (LPS) devices for all 44 male participants (Catapult Optimeye S5, Catapult Innovations, Melbourne, Australia)”.

We processed the figures using the journal guidelines. We discovered in a previous submission that figures in print return a higher resolution.

The authors consider the top panel to be necessary to display how the initial cluster labels (top panel) are transformed into the corresponding descriptive labels (bottom panel). We outline these steps in lines 184-188 “Clusters of congestion were originally assigned output labels of 0-n, while -1 was assigned to all points clustered as noise (see Fig 1). These labels where converted to provide a practical description of congestion and to differentiate between separate clusters of congestion if more than one cluster was identified for a unique time interval (see Fig 1).”

Line 193: State where that data is presented – Figure 2 and 3

Line 200 has been amended to include where data is presented “field position (see Fig 2) and quarter (see Fig 3).”

Line 214: Did the same Champion Data staff label/code the disposals?

Yes, the same staff members were used when labelling the training datasets and developing the spatiotemporal features. We have include a reference to this in lines 216-217 “In consultation with the same match analysts from Champion Data.”

Line 233: Should the Shapley Additive exPlanations package be in italics like the others previously reported?

The authors have altered line 240 to display ‘Shapley Additive exPlanations’ in italics to align with other python packages reported.

Results

All figures have been amended to remove the grey frame, adjust the gridlines to be more transparent (lightened), and the legend is in a consistent location. The authors maintain that a different colour scheme is helpful when differentiating between the models.

Discussion

Line 314: Perhaps link the first two paragraphs together.

Lines 320-321 have been included to link the first and second paragraph in the discussion has been “This information provides a scalable method to quantify congestion during matches.”

Line 382: Can you state the reasoning for this in the methods

LPS technology was only available at matches played at Marvel Stadium. Therefore, the analysis was limited to these matches only. We have included more detail explaining the data collection process in lines 112-117 “To ensure consistent tracking data and uniform field dimensions, matches (n = 56) were played at a single stadium (Marvel Stadium, Melbourne, Australia) where the field dimensions were 159.5 m x 128.8 m (length x width). Positional data in the form of Cartesian coordinates for each match were gathered using Catapult ClearSky 10 Hz local-positioning system (LPS) devices for all 44 male participants (Catapult Sports, Melbourne, Australia).”

References

Ref 2, 14, 34, 35 journal title needs to be in capital format

We have formatted these references to include a capital for the journal title

However, the following minor concerns need to be addressed before publication:

Due to the large amount of abbreviations a list of abbreviations would be recommended, as long as journal’s guidelines approve it, to facilitate the reading.

We thank the reviewer for the comments. After examining the journal guidelines, it appears a list of abbreviations would not align with the journal guidelines. The authors will gladly oblige the request if the journal allows it.

Abstract:

Ln34: Australian instead of Australia

Line 34 has been amended to “Australian instead of Australia”

The authors have adjusted the discussion to be more explicit in referencing the aspects in the abstract. Lines 342-344 now highlight the impact on passing capacity: “Overall, more than 60% of disposals encountered high congestion or nearby congestion. This suggests that large segments of match-play experiences greater density around the ball-carrier, which may instil pressure and influence passing capacity”. Passing capacity is also referenced in lines 346-348 “Disposals performed under low congestion decreased as teams transitioned the ball towards their attacking end”.

We have amended lines 358-360 to comment on fan enjoyment “In response to a steadily declining scoring rate and a predominantly defensive game style, the AFL, guided by the Laws of the Game Charter, have continually implemented major rule changes to enhance fan enjoyment.”

Rule changes are referred to in lines 366-368 “such rule changes constrain the defending team’s ability to restrict ball transition, thereby allowing for more attacking ball movement for the offensive team, which may increase scoring.”

Ln54: Congestion is already in the title. I would suggest to switch it for another key word to avoid repeat it. In this way it may facilitate this article to be found from more searches.

Line 54 has been amended to remove “Congestion” from the key words

Materials and methods:

*Ln203 to Ln214: As I see, the definitions of level of congestion and spatiotemporal features for disposal classification model are done by yourself without any evidence but in consultation with professionals. I understand there is no work that establishes previous criteria about it. But why these levels and not others? As you may understand these definitions may be somehow relative to the individual and environmental constraints of the game. For example, for nearby congestion “multiple players with 0-10m of ball-carrier but there is some space to make a decision”: within this 10 m some experienced and fresh player probably will have time to make a clear decision, however, a novice and fatigued young player during a rainy day probably can feel high congestion in this space to make an “adequate” decision. I am not intended to change your definitions but to help future readers to consider different features for classifying the dynamic and nonlinear level of congestion, assessing previously the main constraints of the game (e.g., age of players, level of players, meteorology, etc.). In my opinion, this should be briefly considered as a practical implication (in discussion) for next studies to do not treat it as a universal and fixed rule.

We thank the reviewer for the comments. The authors had similar discussions when formulating the methodology. As you mentioned, due to the paucity of studies investigating congestion, it was difficult to use empirical evidence. As such, we developed the spatiotemporal features with professional analysts at Champion Data due to their have domain expertise.

We have further included a brief discussion in lines 394-399 that outlines the limitations and future considerations “The machine learning models proposed in this study to quantify congestion were novel, which naturally specifies the parameters used to tune the algorithms were likewise organic. Although the models were thoroughly trialed and tested using various input parameters, a greater implementation from a broader range of experts may assist in ensuring the methodology is valid and reliable and if alterations need to be tailored for specific applications.”

Discussion:

Ln310: Why do you not use the abbreviation for Australian Football (AF) here?

Line 317 has been amended to include the abbreviation “This study developed two methods to measure congestion in AF.”

*Ln315: Why do you use approximately? Is there no exact value for it?

Each of the three clusters of congestion is separated into four field positions (4) and quarter (4). Thus, we have 24 individual combinations of congestion. When summarising these combinations, the range was between 23% and 26%. This has been included in lines 322-323 “The first method showed that players are within a cluster of congestion (primary or secondary) between 23% and 26% of a typical game”.

Ln347: Very interesting finding. This reinforce previous articles finding how collective coordination dynamics decrease across the periods of the match, highly influenced by effort accumulation. See: Duarte, R., Araújo, D., Folgado, H. et al. Capturing complex, non-linear team behaviours during competitive football performance. J Syst Sci Complex 26, 62–72 (2013). https://doi.org/10.1007/s11424-013-2290-3

This may suggest that your used data analysis techniques may be proposed as complementary methods of analysis to approach effort accumulation and acute fatigue effects in collective sports.

Thank you for recommending these articles. The authors discussed player fatigue and how it may influence coordination dynamics. These articles will be very useful in designing future investigations that view congestion within a complex systems framework.

Conclusion:

Ln391: “approximately 25%”, as I mentioned above, cannot be described with its exact value?

Lines 396 have been amended to “The clustering model identified that players were within a cluster of congestion between 23% and 26% of a typical game.”

Attachment

Submitted filename: Response to Reviewers.docx

Click here for additional data file.^{(26.8KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0272657.r003

Decision Letter 1

Gábor Vattay

25 Jul 2022

Quantifying congestion with player tracking data in Australian Football

PONE-D-22-01015R1

Dear Dr. Alexander,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Gábor Vattay, PhD, DSc

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Please, before resubmission correct the few typo/missing information mentioned by the reviewer!

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #1: General comments

Great work on making the suggested changes from both reviewers. The manuscript has substantially improved in quality and is very close to being published in my opinion. I have just a few remaining queries and suggestions, that are detailed below.

Specific comments

Abstract: Looks great

Introduction:

Line 66: Consumption of what? Be more specific here or consider using a different word. Reads like consumption of food

Line 77: Please use the correct units of measurement/notation here for m/s, so m·sP-1P

Materials and methods

Line 188: Figure caption is here, but no figure?

Results

The figures look much cleaner now without the gridlines

Discussion

Line 392: Explain what you mean by organic in this context

Reviewer #2: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Heidi Compton

Reviewer #2: Yes: Lluc Montull

**********

PLoS One. doi: 10.1371/journal.pone.0272657.r004

Acceptance letter

Gábor Vattay

29 Jul 2022

PONE-D-22-01015R1

Quantifying Congestion with player tracking data in Australian Football

Dear Dr. Alexander:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Gábor Vattay

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 File

(ZIP)

Click here for additional data file.^{(534.7KB, zip)}

Attachment

Submitted filename: Response to Reviewers.docx

Click here for additional data file.^{(26.8KB, docx)}

Data Availability Statement

All relevant data are within the manuscript and its Supporting Information files.

[pone.0272657.ref001] 1.Gray AJ, Jenkins DG. Match analysis and the physiological demands of Australian football. Sports Medicine. 2010;40(4):347–60. doi: 10.2165/11531400-000000000-00000 [DOI] [PubMed] [Google Scholar]

[pone.0272657.ref002] 2.Robertson S, Back N, Bartlett JD. Explaining match outcome in elite Australian Rules football using team performance indicators. Journal of Sports Sciences. 2016;34(7):637–44. doi: 10.1080/02640414.2015.1066026 [DOI] [PubMed] [Google Scholar]

[pone.0272657.ref003] 3.Mason RJ, Farrow D, Hattie JA. An analysis of in-game feedback provided by coaches in an Australian Football League competition. Physical Education and Sport Pedagogy. 2020;25(5):464–77. [Google Scholar]

[pone.0272657.ref004] 4.Woods CT, Robertson S, Collier NF. Evolution of game-play in the Australian Football League from 2001 to 2015. Journal of sports sciences. 2017;35(19):1879–87. doi: 10.1080/02640414.2016.1240879 [DOI] [PubMed] [Google Scholar]

[pone.0272657.ref005] 5.Norton KI, Craig N, Olds T. The evolution of Australian football. Journal of Science and Medicine in Sport. 1999;2(4):389–404. doi: 10.1016/s1440-2440(99)80011-5 [DOI] [PubMed] [Google Scholar]

[pone.0272657.ref006] 6.Lane JC, van der Ploeg G, Greenham G, Norton K. Characterisation of offensive and defensive game play trends in the Australian Football League (1999–2019). International Journal of Performance Analysis in Sport. 2020;20(4):557–68. [Google Scholar]

[pone.0272657.ref007] 7.Bowen N. Know the new rules? 6-6-6, 50m penalties, kick-in rule explained afl.com.au: Australian Football League; 2019 [Available from: https://www.afl.com.au/news/121022/know-the-new-rules-6-6-6-50m-penalties-kick-in-rule-explained].

[pone.0272657.ref008] 8.Norton KI, editor Evolution of rule changes and coaching tactics in Australian Football: impact on game speed, structure and injury patterns. Science and Football VII: The Proceedings of the Seventh World Congress on Science and Football; 2013; Routledge, Abingdon, Oxon. [Google Scholar]

[pone.0272657.ref009] 9.Robertson S, Gupta R, McIntosh S. A method to assess the influence of individual player performance distribution on match outcome in team sports. Journal of Sports Sciences. 2016;34(19):1893–900. doi: 10.1080/02640414.2016.1142106 [DOI] [PubMed] [Google Scholar]

[pone.0272657.ref010] 10.Spencer B, Jackson K, Bedin T, Robertson S. Modelling the quality of player passing decisions in Australian Rules football relative to risk, reward and commitment. Frontiers in Psychology. 2019;10:1777. doi: 10.3389/fpsyg.2019.01777 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0272657.ref011] 11.Jackson K. Assessing Player Performance in Australian Football Using Spatial Data. Melbourne: Swinburne University of Technology; 2016. [Google Scholar]

[pone.0272657.ref012] 12.Vella A, Clarke AC, Kempton T, Ryan S, Holden J, Coutts AJ. Possession chain factors influence movement demands in elite Australian football match-play. Science and Medicine in Football. 2020:1–7. [DOI] [PubMed] [Google Scholar]

[pone.0272657.ref013] 13.Taylor N, Gastin PB, Mills O, Tran J. Network analysis of kick-in possession chains in elite Australian football. Journal of Sports Sciences. 2020;38(9):1053–61. doi: 10.1080/02640414.2020.1740490 [DOI] [PubMed] [Google Scholar]

[pone.0272657.ref014] 14.Alexander JP, Spencer B, Sweeting AJ, Mara JK, Robertson S. The influence of match phase and field position on collective team behaviour in Australian Rules football. Journal of Sports Sciences. 2019;37(15):1699–707. doi: 10.1080/02640414.2019.1586077 [DOI] [PubMed] [Google Scholar]

[pone.0272657.ref015] 15.Agrawal K, Garg S, Sharma S, Patel P. Development and validation of OPTICS based spatio-temporal clustering technique. Information Sciences. 2016;369:388–401. [Google Scholar]

[pone.0272657.ref016] 16.Kriegel HP, Kröger P, Sander J, Zimek A. Density‐based clustering. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2011;1(3):231–40. [Google Scholar]

[pone.0272657.ref017] 17.Babichev S, Durnyak B, Pikh I, Senkivskyy V, editors. An evaluation of the objective clustering inductive technology effectiveness implemented using density-based and agglomerative hierarchical clustering algorithms. International Scientific Conference “Intellectual Systems of Decision Making and Problem of Computational Intelligence”; 2019: Springer.

[pone.0272657.ref018] 18.Malzer C, Baum M, editors. A hybrid approach to hierarchical density-based cluster selection. 2020 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI); 2020: IEEE.

[pone.0272657.ref019] 19.Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine learning in Python. the Journal of machine Learning research. 2011;12:2825–30. [Google Scholar]

[pone.0272657.ref020] 20.Saha R. Homonym Identification using BERT—Using a Clustering Approach. arXiv preprint arXiv:210102398. 2021. [Google Scholar]

[pone.0272657.ref021] 21.Zhu Y, Ting KM, Jin Y, Angelova M. Hierarchical clustering that takes advantage of both density-peak and density-connectivity. Information Systems. 2022;103:101871. [Google Scholar]

[pone.0272657.ref022] 22.Carey DL, Ong K, Morris ME, Crow J, Crossley KM. Predicting ratings of perceived exertion in Australian football players: methods for live estimation. International Journal of Computer Science in Sport, 15 (2): 64. 2016;77. [Google Scholar]

[pone.0272657.ref023] 23.Grira N, Crucianu M, Boujemaa N. Unsupervised and semi-supervised clustering: a brief survey. A review of machine learning techniques for processing multimedia content. 2004;1:9–16. [Google Scholar]

[pone.0272657.ref024] 24.Barrionuevo GO, Ríos S, Williams SW, Ramos-Grez JA, editors. Comparative Evaluation of Machine Learning Regressors for the Layer Geometry Prediction in Wire arc Additive manufacturing. 2021 IEEE 12th International Conference on Mechanical and Intelligent Manufacturing Technologies (ICMIMT); 2021: IEEE.

[pone.0272657.ref025] 25.Breiman L. Random forests. Machine learning. 2001;45(1):5–32. [Google Scholar]

[pone.0272657.ref026] 26.Cust EE, Sweeting AJ, Ball K, Robertson S. Classification of Australian football kick types in-situation via ankle-mounted inertial measurement units. Journal of Sports Sciences. 2021:1–9. doi: 10.1080/02640414.2020.1868678 [DOI] [PubMed] [Google Scholar]

[pone.0272657.ref027] 27.Whitehead S, Till K, Jones B, Beggs C, Dalton-Barron N, Weaving D. The use of technical-tactical and physical performance indicators to classify between levels of match-play in elite rugby league. Science and Medicine in Football. 2020:1–7. [DOI] [PubMed] [Google Scholar]

[pone.0272657.ref028] 28.Bransen L, Van Haaren J, editors. Measuring football players’ on-the-ball contributions from passes during games. International workshop on machine learning and data mining for sports analytics; 2018: Springer.

[pone.0272657.ref029] 29.Lundberg SM, Erion GG, Lee S-I. Consistent individualized feature attribution for tree ensembles. arXiv preprint arXiv:180203888. 2018. [Google Scholar]

[pone.0272657.ref030] 30.Carter JV, Pan J, Rai SN, Galandiuk S. ROC-ing along: Evaluation and interpretation of receiver operating characteristic curves. Surgery. 2016;159(6):1638–45. doi: 10.1016/j.surg.2015.12.029 [DOI] [PubMed] [Google Scholar]

[pone.0272657.ref031] 31.Baboota R, Kaur H. Predictive analysis and modelling football results using machine learning approach for English Premier League. International Journal of Forecasting. 2019;35(2):741–55. [Google Scholar]

[pone.0272657.ref032] 32.Fawcett T. An introduction to ROC analysis. Pattern recognition letters. 2006;27(8):861–74. [Google Scholar]

[pone.0272657.ref033] 33.Sofaer HR, Hoeting JA, Jarnevich CS. The area under the precision‐recall curve as a performance metric for rare binary events. Methods in Ecology and Evolution. 2019;10(4):565–77. [Google Scholar]

[pone.0272657.ref034] 34.Johnston RJ, Watsford ML, Pine MJ, Spurrs RW, Murphy A, Pruyn EC. Movement demands and match performance in professional Australian football. International Journal of Sports Medicine. 2012;33(02):89–93. doi: 10.1055/s-0031-1287798 [DOI] [PubMed] [Google Scholar]

[pone.0272657.ref035] 35.Alexander JP, Bedin T, Jackson KB, Robertson S. Team numerical advantage in Australian rules football: A missing piece of the scoring puzzle? Plos One. 2021;16(7):e0254591. doi: 10.1371/journal.pone.0254591 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0272657.ref036] 36.AFL Tables 2021 [Available from: https://afltables.com/afl/seas/2021.html.

[pone.0272657.ref037] 37.Wallace JL, Norton KI. Evolution of World Cup soccer final games 1966–2010: Game structure, speed and play patterns. Journal of Science and Medicine in Sport. 2014;17(2):223–8. doi: 10.1016/j.jsams.2013.03.016 [DOI] [PubMed] [Google Scholar]

[pone.0272657.ref038] 38.Norton K. Match analysis in AFL, Soccer and Rugby Union: patterns, trends and similarities: Routledge; 2013. [Google Scholar]

[pone.0272657.ref039] 39.Pappalardo L, Cintia P. Quantifying the relation between performance and success in soccer. Advances in Complex Systems. 2018;21(03n04):1750014. [Google Scholar]

[pone.0272657.ref040] 40.Ensum J, Pollard R, Taylor S. Applications of logistic regression to shots at goal at association football: Calculation of shot probabilities, quantification of factors and player/team. Journal of Sports Sciences. 2004;22(6):500–20. [Google Scholar]

PERMALINK

Quantifying congestion with player tracking data in Australian football

Jeremy P Alexander

Karl B Jackson

Timothy Bedin

Matthew A Gloster

Sam Robertson

Roles

Abstract

Introduction

Materials and methods

Data collection

Data analysis

Continuous congestion during match play

Analysis for continuous congestion during match play

Fig 1. Players clustered as primary, secondary, or outside congestion during three separate time intervals during a match.

Fig 2. Proportion of players (mean ± standard deviation) in each cluster compared across field position.

Fig 3. Proportion of players (mean ± standard deviation) in each cluster compared across quarter.

Classifying level of congestion during disposals

Table 1. Definition of the level of congestion when disposing of the ball.

Table 2. Definition of spatiotemporal features for disposal classification model.

Analysis for classifying level of congestion during disposals

Results

Continuous congestion during match play

Classifying level of congestion during disposals

Fig 4. Global importance and local importance of each feature using SHAP values.

Fig 5.

Fig 6.

Fig 7. Breakdown of disposals (mean ± standard deviation) in each level of congestion compared across field position.

Fig 8. Breakdown of disposals (mean ± standard deviation) in each level of congestion compared across quarter in the 2019 and 2021 season.

Discussion

Conclusion

Supporting information

Data Availability

Funding Statement

References

Decision Letter 0

Gábor Vattay

Roles

Author response to Decision Letter 0

Decision Letter 1

Gábor Vattay

Roles

Acceptance letter

Gábor Vattay

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases