How can Artificial Neural Networks Transform Player Recruitment in Football?

A study using ANNs to objectively identify KPIs in professional soccer, accurately predicting career trajectories and enhancing player recruitment processes based on technical performance data.

Apr 16, 2024

Introduction

The research paper starts by highlighting the evolution of scouting and recruitment practices in professional soccer, particularly in response to UEFA’s Club Licensing and Financial Fair Play Regulations introduced in 2010. The modernization of scouting processes has become crucial for elite clubs aiming to minimize financial losses from player trading and improve overall success. Despite decades of research into factors associated with success in soccer, including physical demands and talent identification, there is still a gap in understanding key performance indicators that differentiate successful players and teams.

Early research focused on the physical demands of soccer, but it became apparent that physical parameters alone do not fully explain playing success. Criticism of youth academies for a bias towards physically mature players, known as the ‘relative age effect,’ underscored the need to shift focus towards skills and development. Consequently, there has been growing interest in exploring technical factors that contribute to playing success, but research in this area remains limited.

Traditional statistical techniques like regression and discriminant analysis have been prevalent in soccer research, but there is increasing interest in utilizing artificial neural networks (ANNs) for performance analysis. ANNs offer advantages such as the ability to handle complex, non-linear data and do not require data to be normally distributed. However, their application in soccer research is still in its early stages.

The paper aims to address this gap by developing an objective model to identify key performance indicators influencing outfield players’ league status using an artificial neural network. By assessing a wide range of variables objectively and analyzing a larger sample size, the study seeks to establish factors linked with career progression in professional soccer. This objective model could provide valuable support for assessing potential transfer targets and complement subjective assessments by coaches and scouts, contributing to the advancement of scouting and recruitment practices in the sport.

Materials and Methods

Players and match data

In the materials and methods section, the research details the dataset and methodology used for the study. The study involved 966 outfield players who completed the full 90 minutes in 1104 matches played in the English Football League Championship during the 2008/09 and 2009/10 seasons. The players had a mean age of 25 years and a mean height of 1.81 meters.

Technical performance data and biographical data were collected for each player using ProZone’s MatchViewer software, which compiled 335 performance variables. These variables included metrics such as passes, tackles, possessions regained, clearances, and shots, with information on total number, accuracy percentage, means, medians, and quartiles. The MatchViewer system recorded key variables related to actions performed during matches, such as event type, time, and players involved.

Initially, the dataset contained 505 variables, but those with low variance were removed before analysis. The data provided by STATS LLC was supplemented with additional data collected from the official Football League and Scout7 Ltd websites, including variables such as total appearances, playing percentage, goals, assists, international appearances, and heights.

Each player’s match-by-match data for the 335 performance variables was converted into a mean to represent their average 90-minute performance, and they were then categorized accordingly. Institutional ethical approval was obtained from the Non-Invasive Human Ethics Committee at Nottingham Trent University, ensuring compliance with ethical standards in data collection and analysis.

Player grouping

The study categorizes players into three groups based on their match time allocation in the subsequent season. The first group (Group 0) consists of players who predominantly played in a lower league, with a sample size of 209 and a mean of 10 ± 10 90-minute appearances. The second group (Group 1) comprises players who predominantly played in the English Football League Championship in the following season, with 637 players and a mean of 18 ± 12 90-minute appearances. The third group (Group 2) consists of players who progressed to play most of their matches in the English Premier League, with 120 players and a mean of 19 ± 12 90-minute appearances.

To ensure balanced sample sizes for comparison, random selection was used to choose 209 players from Group 1. The study then employed a Stepwise Artificial Neural Network approach to analyze the three categories and identify the optimal set of variables for predicting players’ league status. This involved pairwise comparisons between two of the three groups using the neural network to discern the key variables contributing to players’ league status.

Artificial neural network model

The artificial neural network (ANN) model employed in the study followed a methodology previously successful in gene profiling with breast cancer data. To begin, the dataset was divided randomly into three subsets: 60% for training, 20% for validation, and 20% for independent testing. This split was crucial for preventing overfitting and ensuring the generalizability of the model. The Monte-Carlo cross-validation procedure was adopted, known for its superiority and consistency over other methods like leave-one-out cross-validation.

The ANN architecture utilized a multi-layer perceptron (MLP) with a back-propagation algorithm. This algorithm, employing a sigmoidal transfer function, updated weights based on error feedback. The learning rate, determining the proportion of weight updates relative to error, was set at 0.1, while the momentum, governing the proportion of the previous weight change applied to the current change, was 0.5. The architecture featured two hidden nodes, acting as feature detectors, in a single hidden layer.

To prevent overfitting, the maximum number of epochs (updates of the network) was capped at 300, with a threshold of 100 epochs without improvement on the test set. This framework ensured the model’s robustness and avoided excessive fine-tuning on the training data. Performance metrics provided included the average test performance, indicating the percentage of correctly predicted test cases, and the average test error, representing the root mean square error between predicted and actual values of the test dataset. These metrics served as benchmarks for evaluating the model’s predictive accuracy and generalization capability.

Results

The results of the analysis using artificial neural networks (ANNs) revealed varying degrees of success in distinguishing between different player groups based on their subsequent league status. For the comparison between players in group 0 (lower league) and group 1 (Championship), the ANN model achieved a prediction accuracy of 67.9% with an error of 10.8%, utilizing a combination of nine variables. Notably, playing percentage and the success rate of backward passes emerged as the most influential variables in this model. However, the ANN did not yield a suitable model to differentiate between players in group 1 and group 2 (Premier League).

Conversely, for the comparison between group 1 and group 2 players, the ANN achieved a prediction accuracy of 61.5% with an error of 11.6%, employing a combination of seven variables. Here, variables such as the percentage of unsuccessful headers and the number of possessions played crucial roles in the model.

“**Table 3.** Results for group 1 v group 2 balanced data set (Best Average Test Performance = 61.5% and Best Average Test Error = 11.6% with a combination of seven variables) and group 1 v group 2 model variables as means and standard deviations for player groupings.”

Interestingly, the ANN did succeed in developing a robust model for distinguishing between players in group 2 and group 0. This model accurately predicted the league status of 78.8% of the test group players with an error of 8.3%, utilizing ten variables. U21 caps, senior international caps, and tackles were identified as the most significant variables in this model, highlighting the importance of international experience and defensive capabilities in progressing to the Premier League.

“**Table 4.** Results for the group 0 v group 2 balanced data set (Best Average Test Performance = 78.8% Best Average Test Error = 8.3% with a combination of ten variables) and group 0 v group 2 model variables as means and standard deviations for player groupings.”

These findings underscore the complexity of predicting players’ league status based on performance metrics and biographical data. While certain variables appear to be influential in determining career trajectory, the interplay between different factors remains intricate. Nonetheless, the ANN demonstrates potential as a tool for identifying key indicators associated with players’ career progression in professional soccer, offering valuable insights for talent identification and recruitment strategies.

Discussion

The paper reflects on the primary objective of the study, which aimed to develop an objective model for identifying key performance indicators (KPIs) in professional soccer that influence outfield players’ league status, using artificial neural networks (ANNs). The study analyzed the performances of 966 players, categorizing them into three groups based on their subsequent playing level in the English professional soccer structure.

The choice of ANNs for this research is justified by their capacity to offer highly accurate predictive methods for complex datasets, particularly in dealing with non-linear data, which is common in sports analytics. Moreover, ANNs provide an objective approach to identifying KPIs, contrasting with subjective methods that have been historically employed in similar studies.

The developed ANN model demonstrated the ability to accurately predict players who would be promoted to a higher level and those who would play at a lower level. However, it was noted that the model did not accurately predict other comparisons, indicating limitations in its predictive capacity for certain scenarios.

Overall, the discussion highlights the potential of ANNs as a valuable tool for talent identification and recruitment in professional soccer, particularly in objectively identifying KPIs associated with players’ career progression. However, it also acknowledges the need for further research to refine the ANN models and address the limitations observed in the study’s predictions.

Artificial neural network architecture

The artificial neural network (ANN) architecture employed in the study utilized a constrained design featuring only 2 hidden nodes. This design choice aimed to mitigate the risk of overfitting and ensure the reliability and generalizability of the model. Initial weights were set with a small variance to further prevent overfitting and reduce the likelihood of false discovery.

The decision to limit the number of hidden nodes and layers was deliberate, as increasing complexity by adding more nodes or layers was observed to lead to longer training times and a decrease in performance on unseen data. This phenomenon indicated a loss of generality in the classifiers, emphasizing the importance of maintaining a balance between model complexity and generalizability.

To maximize generality and prevent overfitting, the models were developed using a Monte Carlo cross-validation approach combined with early stopping and multiple repeats. This approach helped ensure that the models were robust and reliable across different datasets.

The learning rates and momentum parameters were set at 0.1 and 0.5, respectively. While these parameters had a minor impact on the performance of the classifiers, they were tuned to optimize the training process and enhance the overall effectiveness of the ANN architecture.

Overview of models

The comparison between players dropping down to a lower playing level (Group 0) and those progressing to play in the English Premier League (Group 2) yielded a notably stronger model compared to other comparisons. This model achieved a commendable accuracy of 78.8% in predicting the league status of players in the test dataset. The robust performance of the neural network in this specific comparison underscores its effectiveness in discerning between players at different levels of playing ability.

The logic behind the superior performance of the neural network in this comparison is intuitive. Players transitioning to the Premier League and those moving down to a lower division represent extremes in terms of playing ability. Therefore, the distinctions between these two groups are likely more pronounced, making them easier for the neural network to identify.

The achievement of accurately classifying 78.8% of player groupings in this model holds significant importance. Notably, this performance surpasses that of other models developed for classifying performance in cricket, as documented in prior research efforts. This underscores the effectiveness and superiority of the neural network approach adopted in this study compared to alternative methodologies used in similar domains.

Key variables in group 0 v group 2 model

The key variables identified in the model comparing players from Group 0 (lower league) and Group 2 (Premier League) shed light on various aspects influencing players’ career trajectories in professional soccer.

International experience emerged as a significant factor, with players who progressed to the Premier League boasting higher numbers of international caps at both senior and Under-21 levels compared to their counterparts in lower leagues. This underscores the role of national associations in identifying and nurturing talent from a young age, potentially leading to favorable opportunities for players with international recognition early in their careers.

Interestingly, defensive variables such as tackles and possessions gained exhibited unexpected patterns. Contrary to previous research, players in Group 0 demonstrated higher averages for median tackles and minimum possessions gained. This discrepancy may be attributed to the specific context of the study, as previous research has focused on international and European competitions. The findings suggest that successful players exhibit a keen ability to anticipate opposition movements and execute crucial defensive actions, reflecting the importance of defensive contributions in modern soccer.

Passing variables also emerged as influential factors, particularly the percentage of unsuccessful first-time passes. Players progressing to the Premier League displayed lower rates of unsuccessful first-time passes, indicating their proficiency in maintaining possession and breaking down defensive lines. This highlights the evolving nature of passing strategies in soccer, with successful players demonstrating adeptness in executing passes effectively to create scoring opportunities.

Moreover, metrics such as mean number of possessions and median penalty area entries were indicative of players’ success levels. Premier League-bound players exhibited higher mean number of possessions and median penalty area entries, suggesting their involvement in matches and ability to create goal-scoring opportunities. These findings align with previous research emphasizing the importance of possession and penetration into the opposition penalty area in achieving success in soccer.

Overall, the identified key variables provide valuable insights into the factors driving players’ progression to higher playing levels in professional soccer. The integration of advanced statistical techniques such as artificial neural networks enables a comprehensive analysis of player performance data, revealing nuanced relationships that may not be discernible through traditional statistical methods. Moving forward, further research is warranted to explore additional facets of player performance and their implications for career advancement in soccer.

Study limitations

The study acknowledges several limitations that should be addressed in future research endeavors. Firstly, the analysis was conducted across three discrete player groups without considering playing positions. Previous research has demonstrated significant variations in playing profiles among different positions, including differences in physical output, defensive contributions, and involvement in attacking aspects. Given this, future studies should account for positional differences to provide a more nuanced understanding of recruitment indicators within the Football League Championship.

Secondly, the study lacked information regarding the physical capabilities and performance of the players involved. While extensive physical performance data is collected on players during testing protocols, training sessions, and matches, this information was not available for inclusion in the current study due to privacy concerns. However, previous research has highlighted the importance of physical indicators in influencing players’ technical performance and overall match outcomes. Incorporating such physical performance data into future study designs could enhance the comprehensiveness of the research and potentially improve the accuracy of predictive models.

Addressing these limitations would contribute to a more comprehensive understanding of the recruitment dynamics in association football, allowing for more accurate identification of key indicators driving player progression and success within the Football League Championship. Additionally, future research efforts could explore the interplay between technical and physical performance indicators to gain deeper insights into the multifaceted nature of player recruitment and development in professional soccer.

Conclusions

The study concludes by affirming the feasibility of using artificial neural networks to identify performance indicators that influence a player’s league status and accurately predict their career trajectory in professional soccer. It emphasizes the potential of this approach in enhancing the efficiency and accuracy of the scouting and recruitment process within the sport.

The findings underscore the importance of further research to refine and expand upon the current model. Specifically, future investigations should focus on conducting more position-specific analyses and incorporating both physical and technical performance data to improve the accuracy of predictive models. By addressing these aspects, researchers can develop a more comprehensive understanding of the factors influencing player progression and success in soccer.

Ultimately, the study envisions the development of a systematic process for accurately predicting a player’s future playing status based on performance data. Such an objective tool could significantly enhance the scouting and recruitment process by reducing inaccuracies and biases associated with subjective assessments. By leveraging artificial neural networks, clubs and organizations can make more informed decisions when selecting key players and assessing potential transfer targets, thereby optimizing their recruitment strategies in professional soccer.

Be a Team Player — Pass It On!

Barron, D., Ball, G., Robins, M., & Sunderland, C. (2018). Artificial neural networks and player recruitment in professional soccer. PloS one, 13(10), e0205818. https://doi.org/10.1371/journal.pone.0205818