Enhancing Pass Receivers Prediction with Convolutional Spatial Relations Learning

A convolutional architecture that learns representations over spatial relations in soccer, efficiently predicting individual passes between players and capturing complex gameplay pattern.

Jul 26, 2023

Introduction

In this paper, the authors focus on predicting individual passes between players during a match based on static snapshots of each pass situation, which includes information about ball possession and the locations of all players involved. The authors approach the problem from a geometrical perspective, treating each situation as an independent, static viewpoint of the game.

To enable generalization across different situations, they enrich the data by adding soccer-specific contextual locations and convert the absolute positions of players into relative distances. By doing so, the model can reason about the mutual spatial patterns between players rather than their absolute positions on the field. These spatial patterns are represented using convolutional filters, which capture inherent symmetries and geometrical regularities dictated by the rules of the game.

The enriched spatial patterns are then aggregated using pooling techniques and combined in a fully connected manner to explore their relations. Notably, the authors emphasize end-to-end learning, aiming to minimize assumptions and expert knowledge reliance in their approach.

By adopting this geometrical perspective and utilizing convolutional filters, the model has the potential to learn complex spatial dependencies between players, which are crucial in predicting passes accurately during a soccer match. This approach offers a data-driven solution to the prediction challenge, allowing the model to leverage inherent patterns in the data and learn predictive features directly from the input without relying heavily on pre-defined rules or expert domain knowledge.

Related Work

This section of the paper provides an overview of previous research relevant to the task of predicting soccer passes and related soccer analytics.

One notable approach mentioned is the use of the Inductive Logic Programming (ILP) model trained on qualitative spatial representations. This method was applied to predict soccer passes, demonstrating the potential of logic-based techniques in this domain. Additionally, a similar approach was employed to discover offensive patterns in soccer matches, highlighting the versatility of ILP in soccer analytics.

Spatio-temporal data were also utilized in other studies to infer teams’ play-styles and examine the likelihood of scoring a goal from a shot. This suggests that analyzing the movement of players and the ball over time can provide valuable insights into team strategies and scoring opportunities.

Furthermore, a physics-based model of soccer ball motion was applied to predict the receiver of a pass. This approach suggests that incorporating physical dynamics can improve pass prediction accuracy.

Overall, the related work section demonstrates the diverse range of methodologies employed in soccer analytics, including logic-based models, spatio-temporal analysis, and physics-based approaches. Each of these methods contributes to the understanding and prediction of soccer passes and other aspects of the game.

Dataset

Next, it is presented the data used for the soccer pass prediction task, detailing its size and characteristics.

The dataset comprises 12,124 soccer passes, out of which 10,045 were successful passes, indicating that both the sender and receiver belonged to the same team. The focus of the study is solely on predicting successful passes, which aligns with previous research.

Unlike a previous work, this dataset contains only static snapshots of the game, represented by the coordinates of all 22 players on the field. Consequently, the situations are treated as independent of each other, making the prediction task more challenging due to the lack of information about players’ momentum, orientation in space, and continuity across situations. Additionally, the dataset includes timestamps indicating when the pass was sent and received, but since the goal is predictive modeling, the timestamp of the pass receipt is excluded from the analysis.

In some cases, only 21 players’ coordinates are present, likely due to a player being sent off during the game. To handle this missing data, the researchers inputted surrogate large numbers as coordinates, rendering the position meaningless for the prediction task.

Predictive model

In the modeling section of the paper the authors introduce a neural architecture designed for the soccer pass prediction task. The model is composed of convolutional layers, max-pooling, and fully connected layers with a softmax output.

The neural architecture leverages convolutional filters, each responsible for extracting specific context and features from the static game snapshots. These filters are designed to capture various aspects, such as the level of occupation of the potential receiving player, the pressure on the sender of the pass, and the positional relationship between the receiver and their teammates on the field. By extracting these features, the model aims to understand the spatial patterns and relationships that contribute to the success of a pass.

The max-pooling layer plays a crucial role in making the model agnostic to the specific positioning and ordering of players. This enables the model to generalize better, as the importance of the closest players is emphasized, considering that only a few players are typically relevant to each pass.

The softmax output layer is used to encode the exclusive outcomes of each situation. Given that only one pass is executed at a time, the softmax function naturally assigns probabilities to each potential pass receiver, allowing the model to identify the most likely receiver for a given pass situation.

“**Table 1.** Enriching spatial snapshots with contextual locations.”

Knowledge Representation

The researchers then describe the format of the raw data and how it is transformed into a suitable representation for pass prediction in soccer.

The raw data is presented in a tabular format, where each row corresponds to a specific pass situation during a game. For each pass situation, the table provides the x-y coordinates of the 22 players on the field and identifies the sender of the pass (ps) by providing its x-y coordinates as well.

To predict successful passes, the paper focuses on each snapshot from the perspective of potential successful passes between the ball-possessing player (ps) and all their teammates (potential receivers, pr). For each situation, there are ten pairs of players (ps, pr), where pr belongs to the set of teammates excluding the sender (ps). The pairs are defined differently depending on the position of the sender (s) on the field.

To facilitate the pass prediction task, the authors preprocess the data by enriching the pairs with various static and dynamic field locations. These enriched pairs serve as learning examples for the predictive model. The distances between players are measured based on these key locations, as detailed in Table 1 of the paper. This enriched representation enables the model to reason about mutual spatial patterns and distances between players, crucial for predicting successful passes in soccer.

Neural Architecture

The neural model is proposed for predicting successful passes in soccer. The architecture employs convolutional layers, max-pooling, and fully connected layers with a softmax output to capture spatial relations and make joint pass predictions.

The input to the model comprises the spatial relations between potential sender (ps) and receiver (pr) players, as described previously. These relations are transformed into feature maps, which serve as inputs to various convolutional filters. Each filter represents a different viewpoint on the pass, such as cover of the receiver, pressure on the sender, or alternative passing options available to the sender. The filters are instantiated multiple times for different variables iterating over opponents and teammates of the sender. Ordering of players is enforced within each filter, resulting in 1D feature maps.

“**Table 2.** Conformation of spatial relations into convolutional filters.”

The filter values are then aggregated using max-pooling. Global pooling is applied over all instantiations of each filter, focusing on the closest players and suppressing noise from the rest. To capture complex spatial patterns, wider filters of size 3x2 and 3x3 are used to consider pairs of remaining players for cover and pressure.

The patterns extracted from the filters are fed into the fully connected layers. These layers combine the different patterns to determine the potential of each individual pass (ps, pr) based on the sender’s decision-making logic and incorporating the relational contexts of the receiver. The model learns to weight the importance of each pattern in the combinations.

To enable joint reasoning over all possible passes, the softmax output is used. This allows the model to make exclusive predictions for each pass situation as part of the learning process, avoiding the need for post-processing normalization over separate examples. The model’s architecture thus enables it to capture spatial relations and make joint predictions efficiently for successful passes in soccer.

“**Fig. 1.** Architecture of the neural model. Four feature maps of size #features × #instantiations × #possibilities are at the input. Filters of size 3 × 1, 3 × 2 and 3 × 3 are applied to each feature map. The outputs of the convolution are reduced by max pooling and merged with the f1 − f9 features providing their static context. Finally, 2 dense layers with 3, respectively 1, neurons are applied to each possibility. For clarity only 3 out of 10 possibilities (depth dimension) are displayed.”

Experiments

The authors conducted experiments to evaluate the performance of their proposed model for predicting successful soccer passes. They used 10-fold cross-validation and evaluated the model based on two metrics: mean reciprocal rank (MRR) and the frequency of the actual receiver of the pass being among the top three predicted receivers.

The authors compared their model’s performance with a previous approach, which utilized both static and dynamic features derived from the flow of the game. However, the dynamic features were unavailable for the authors’ model, so they focused on comparing their results with the “Static” model from the previous work. Surprisingly, their proposed model outperformed not only the Static model but also the “Combined” model that used both static and dynamic features, as demonstrated in Table 3.

These results suggest that their proposed neural architecture, which leverages spatial relations and convolutional filters to capture various viewpoints of the pass, performs exceptionally well even without access to dynamic game features. The model’s ability to outperform previous methods indicates its effectiveness in predicting successful soccer passes based on static spatial information alone.

Human-level Performance

In this section, the authors aimed to assess the human-level performance for the task of predicting successful soccer passes. They were particularly interested in understanding the impact of the missing dynamic context of the game, which is not available in the form of standard visual recordings, on the predictive ability of humans. To achieve this, they measured and averaged the predictive performance of three soccer enthusiasts who were presented with a sample of 200 randomly selected pass situations.

To facilitate the measurements, the authors created a simple interactive visualization tool. The results indicated that the task of predicting successful passes was challenging even for humans. While the top-1 accuracy of the proposed model and humans was comparable, the top-3 accuracy and mean reciprocal rank (MMR) metrics demonstrated that humans performed better in ranking the alternative pass options.

These findings suggest that the absence of dynamic context, which humans usually rely on from visual recordings, had a notable impact on their predictive ability. Despite the comparable top-1 accuracy, humans exhibited superior performance in ranking the potential pass receivers, indicating the significance of the dynamic context in human pass prediction.

Discussion

The authors conducted a thorough analysis of the model’s predictions and provided insightful observations. They acknowledged a key weakness of the model, which tended to consider only a limited number of viable pass options, even when alternative options were quite similar. The use of softmax in combination with cross-entropy loss during the network training might have contributed to this behavior, leading to an emphasis on certain passes while overlooking other potential choices. The authors suggested that using a ranking loss instead could be a more suitable approach.

The model demonstrated strength in identifying uncovered teammates, sometimes even overestimating their positions. However, it tended to favor passes to the sidelines, even when the ball was more likely to have come from those positions. In comparison, human intuition appeared to excel in capturing the underlying “flow” of the game, suggesting that the model could benefit from incorporating human-like reasoning in certain aspects.

The authors provided a visualization of an example situation, revealing the difficulty of the pass prediction task. Without information about the sender’s orientation on the field, the model considered numerous viable alternatives, and human intuition could have potentially prioritized different options.

“**Fig. 2.** Example model prediction. Possible passlines are depicted by yellow lines, with the actual pass marked by red. The percentages near the passlines show the predicted probabilities.”

Furthermore, the authors explored the decomposition of the static context features from the convolutional filters in the model. Although these feature sets were designed to work together to provide context, the authors conducted separate experiments and found that the convolutional features were more valuable than the static context features in isolation.

Overall, the discussion provided valuable insights into the model’s strengths and weaknesses, as well as the potential for improving pass prediction by addressing certain aspects of the model architecture and incorporating more human-like reasoning.

Conclusion

In conclusion, the research paper presented a neural model designed for soccer pass prediction based on static spatial snapshots of the game. The model utilized a set of carefully designed convolutional filters to extract various relational contexts from each game situation, focusing on the mutual positions of players on the field. The authors justified how this architecture could enable the learning of complex relational patterns through the aggregation of simple spatial relations.

The model’s performance was evaluated on a sizable dataset of captured soccer passes. The results demonstrated promising outcomes, indicating that the proposed approach has the potential to be effective in predicting successful passes in the game of soccer. The research contributes to the field of predictive sport analytics and showcases the capabilities of neural architectures in analyzing complex spatial relationships and making accurate predictions in sports scenarios.

Be a Team Player — Pass It On!

Hubáček, O., Šourek, G., & Železný, F. (2019). Deep learning from spatial relations for soccer pass prediction. In Machine Learning and Data Mining for Sports Analytics: 5th International Workshop, MLSA 2018, Co-located with ECML/PKDD 2018, Dublin, Ireland, September 10, 2018, Proceedings 5 (pp. 159–166). Springer International Publishing. https://www.researchgate.net/profile/Gustav-Sir/publication/332256995_Deep_Learning_from_Spatial_Relations_for_Soccer_Pass_Prediction/links/60cc60fa92851ca3acabcfc1/Deep-Learning-from-Spatial-Relations-for-Soccer-Pass-Prediction.pdf