Can Player Positioning Be Rated Without Expert Knowledge?
Using Spatiotemporal Data and Deep Reinforcement Learning to Evaluate Football Situations.
The following summary critically reviews the research paper titled “Learning to Rate Player Positioning in Soccer” by Uwe Dick and Ulf Brefeld. All data, figures, and analysis presented here are drawn from their original work; I do not claim any authorship or ownership of the content. This summary has been written to provide a concise and technically informed synthesis of the paper’s findings, methodologies, and implications, while maintaining fidelity to the authors’ intellectual contributions.
Introduction
The paper tries to find a data-driven way to discover and rate attacking patterns that raise a team’s immediate chance of success, noting that in football the signal is sparse: “due to the low-scoring nature of the game, successful attack patterns do not only involve those that lead to actual goals” [1]. Instead, sequences that culminate in clear chances or entry into a “danger zone” such as “the last 25 m of the pitch” [2] are treated as success. The central idea is state valuation: if executing a pattern increases the likelihood of scoring relative to the preceding state, the pattern is good. Accordingly, the task is framed as learning a function that maps full game settings (player and ball (x,y) positions, velocity vectors, and possession status) to real-valued scores that reflect how promising the situation is for the in-possession team.
Concretely, the authors extract open-play possession episodes in which a single team retains the ball until turnover, stoppage, or a success action (e.g., entering the final 25 m). Episodes are labeled positive when they end in success and negative otherwise. Using tracking data from European topflight matches sampled at 25 Hz (covering all players and the ball, plus flags for possession and whether play is live) the approach “learn[s] a scoring function that maps game situations to real numbers using ideas and methods from deep reinforcement learning (RL),” leveraging recent advances in deep models for sequential decision problems [3, 4, 5–7]. The contributions are previewed as: modeling match evolution as Markovian transitions over richly specified states; proposing a convolutional neural network to approximate the value function that “rates game situations according to how good they are for a team”; and demonstrating empirical effectiveness on elite positional data. Together, this establishes a purely data-driven pipeline (from raw spatiotemporal states to valuations) that can surface “good (and bad) attacking patterns by measuring differences in likelihoods,” without injecting handcrafted tactical rules.
Related Work
This study deliberately avoids handcrafted tactical priors and instead learns valuations directly from raw spatiotemporal data. This approach contrasts with earlier efforts such as Link et al. [1], who manually defined a “dangerousity” function from factors like ball location and defensive pressure, calibrating parameters through expert knowledge and restricting analysis to “an area starting 34 m from the opponents’ goal.” Their derived indicators, including action value (the change in dangerousity between players) were limited in scope and interpretability.
Similarly, Lucey et al. [4] predicted shot success via handcrafted features, and Cervone et al. [8] modeled expected points in basketball through hierarchical statistical models, relying on domain-specific representations. Other research has explored spatiotemporal prediction: Copete et al. [9] used deep neural networks to model ball and player trajectories in simulated football, while Bialkowski et al. [3] applied minimum-entropy models to infer tactical formations such as 4-1-4-1 from positional data. Fernando et al. [10] clustered team scoring approaches, and Van Haaren et al. [11] analyzed ball possession phases by manually weighting event types (e.g., shots or crosses).
Event-based methods also include topic modeling approaches such as Wang et al. [12], who used Bayesian inference to discover recurring passing patterns across pitch zones, and Knauf et al. [2], who applied convolutional kernels to positional trajectories to detect initiation and scoring opportunity patterns. Broader unsupervised frameworks, like Haase and Brefeld [13], extracted all frequent multi-trajectory patterns, while Van Haaren et al. [14] used inductive logic programming to learn regular pass sequences. Lucey et al. [15] visualized plays originating from specific areas, and Brandt and Brefeld [16] applied PageRank algorithms to model team interactions. A comprehensive overview of these approaches was summarized by Rein and Memmert [17].
Unlike these works, which embed varying levels of expert assumptions or predefined tactical categories, the current method learns directly from tracking-based positional sequences, inferring what constitutes “good” or “dangerous” situations solely from empirical outcomes.
Contribution
Preliminaries and data specification
The study’s primary goal is to quantify the value of player positioning using tracking data from professional football matches. These data include the continuous x–y coordinates of all players and the ball, sampled at 25 frames per second, along with indicators for ball possession and whether play is active or halted.
A game setting is defined as a single frame encompassing all player and ball positions, movement vectors, and the team in possession. From these, the authors extract episodes of uninterrupted possession, each ending when the team either loses the ball, play stops, or a success action occurs. Two success definitions are tested: (1) entering the final 25 meters of the opponent’s half and (2) possessing the ball within 18 meters of goal, just outside the penalty box.
The model’s aim is to learn a scoring function that assigns higher values to situations likely to lead to successful outcomes. “The optimal function ranks all successful situations higher than unsuccessful ones,” evaluated using the area under the ROC curve (AUC) [18].
Learning
The valuation process builds upon deep reinforcement learning (RL), specifically its batch RL formulation, since “we cannot alter player and ball movements realistically” in recorded matches. The system therefore learns from fixed historical data rather than simulated interactions.
Football matches are modeled as Markov reward processes (MRPs) [19], where each state S represents a frame of play, and transitions p(s,s′) capture the probability of moving to the next configuration s'. A reward R(s,s′) equals 1 for successful episodes and 0 otherwise, with a discount factor γ∈[0,1). The return from a state is defined as:
and the value function as its expected return,
Deep Model
Game states are transformed into image-like tensors with nine channels; representing player and ball positions, partial velocities, and possession indicator (+1 or −1). A three-layer convolutional neural network (CNN) with 32 kernels per layer (6×6 filters, ReLU activation) encodes spatial information, followed by a fully connected layer of 256 neurons producing the scalar value V(s). A reduced version excluding movement channels serves as a baseline to assess the effect of dynamic input.
Batch Reinforcement Learning
The authors employ the 𝜆-return algorithm [25], a forward-view variant of TD(λ) [19], to estimate values from fixed episodes. For each sample episode e=(s_1,...,s_T), the n-step return is:
with the 𝜆-return computed as a weighted average of these partial returns:
The model minimizes the mean squared temporal-difference (TD) error,
Learning
Each episode undergoes a forward pass to compute predicted values. Using these, λ-returns are calculated for all states, assigning a terminal value of 1 for successful episodes and 0 otherwise. These serve as supervised targets in a backpropagation update step. The approach is similar in structure to deep recurrent Q-network (DRQN) methods [26–28], though adapted for value estimation rather than policy optimization.
Empirical Results
Experimental Setup
The dataset comprises five top-tier European matches containing continuous tracking data of all players and the ball. Two definitions of success actions were evaluated: (1) carrying the ball into the final 25 meters of the opponent’s half, and (2) maintaining possession within 18 meters of the goal. These produced 380 successful and 715 unsuccessful episodes (≈11,000 states) for the first case, and 224 successful and 866 unsuccessful episodes (≈11,300 states) for the second. To expand the dataset, all frames were randomly flipped along x and y axes, mirroring play direction and possession flags, an augmentation method similar to that used in image recognition [29].
Model performance was evaluated by predicting the likelihood of success at selected timestamps or field locations and comparing these predictions with actual outcomes. The resulting AUC [18] scores quantify how well the learned value function distinguishes between successful and unsuccessful possession phases.
Training Procedure
The architecture was implemented in TensorFlow, trained with Adam optimization [30] using batches of 30 episodes. The discount factor was set to γ=0.95, and the trace decay to k=0.7 based on grid search. A leave-one-game-out (LOGO) cross-validation procedure ensured generalization across matches, with early stopping applied to prevent overfitting.
Results
Quantitative Evaluation
In Leave-One-Game-Out (LOGO) tests, the model trained on four matches and tested on the fifth. Two input versions were compared: (1) Positions only and (2) Positions + Movements.
Time-Dependent AUC
When scoring states between 3 and 15 seconds before possession ended, the AUC declined gradually with increasing temporal distance from the episode’s outcome (Fig. 3). Even at 13 seconds before the end, the AUC remained well above 0.5, indicating that the model captured meaningful spatiotemporal patterns rather than random fluctuations. Both success definitions yielded comparable performance, with slightly higher stability in the “25 m” scenario.

Area-Dependent AUC
AUCs were also computed for specific pitch zones (Fig. 4). The model performed best when the ball was closer to goal, confirming that valuations reflected attacking potential. Models including movement information consistently outperformed static positional ones, especially in advanced zones, suggesting the importance of velocity data for anticipating success. In contrast, a simple distance-to-goal baseline achieved an AUC of only 0.5, far below the learned model.

Visualization
To interpret learned valuations, the authors visualized high- and low-scoring situations from identical pitch areas.
High-valued scenes (Fig. 5) typically involved open passing options on the wing or fast, dynamic attacks where multiple players moved toward goal.

Low-valued scenes (Fig. 6) showed tight marking and limited passing lanes, often leading to immediate loss of possession.

Visualization of Outcomes
Further examples illustrated how valuations evolve over time. High-value states generally preceded goal-area entries or crosses, but outcomes still depended on execution quality and randomness. In Figure 7, for instance, two near-identical situations ended differently (one led to a goal attempt, the other to a defensive clearance) yet both received similar valuations, reflecting comparable underlying potential.

Visualization of Score Development
Finally, the analysis of temporal valuation shifts revealed the most impactful tactical patterns (Fig. 8). The largest positive temporal differences corresponded to long diagonal passes or quick central combinations that rapidly increased the team’s positional advantage. These patterns represent “surprising” plays that drastically improved success probability within a few seconds, showcasing the model’s capacity to highlight key attacking transitions.
Conclusion
The study introduced a deep reinforcement learning framework for valuating football player positioning directly from tracking data, without relying on handcrafted tactical inputs or domain heuristics. By representing each game frame as a multi-channel spatial tensor and training a convolutional network under a batch RL setup, the model successfully learned to assign meaningful value estimates to positional configurations. These valuations aligned closely with observable tactical outcomes, effectively distinguishing between dangerous and non-threatening situations.
Crucially, this constitutes “the first purely data-driven approach to machines that read and understand games,” bridging the gap toward computational tactics. The learned danger metric offers a foundation for deriving strategic insights; such as identifying when counterattacks outperform slower buildup play or when cross-field passes generate greater threat. By linking the model’s valuations with traditional performance indicators like passing speed or team expansion, analysts could automatically classify historical play patterns against specific opponents and design data-informed game plans. The approach thus demonstrates how learned positional valuations can extend beyond descriptive analytics to tactical decision support in professional football.
Learn More
References
Dick, U., & Brefeld, U. (2019). Learning to rate player positioning in soccer. Big data, 7(1), 71-82. https://www.liebertpub.com/doi/10.1089/big.2018.0054
To keep this article concise, please refer to the original paper for the full list of references.









Fascinating. What if an optimal 'danger zone' sequence just ends in a random punt? Clasic football.
The data simply confirms what we already know. When opposition defences aren't set, you have a better chance of scoring, and by playing swiftly, forward, or by switching play, you can unsettle defensive formation even further to maximise opportunity. Which is I suppose why so many teams rely on counter-attacking football and shape in defense, knowing when to break out. An interesting addition to this is crosses into the box (which have seen a surge including from throw ins this season). Data has been around a very long time in football. In the 80's & early 90's George Graham used some kind of data to maximise Arsenal's scoring opportunities during a couple of title winning seasons. He worked out something like 95% of goals are scored from inside the box. So getting the ball into the opposition box provides a much higher chance of you scoring. It's data simplified, but still to the same effects as today, if you have the ball in the opposition box, you have a better chance of scoring a goal than you would anywhere else on the pitch!