Assessing Individual Offensive Contributions and Tactical Strategies in Soccer with Markov Chains

Using Markov chains to understand the value of game situations and quantifying player’s contribution to creating goal-scoring…

Apr 12, 2023

Introduction and Objectives

The research presents a framework for tactical analysis and individual offensive production assessment in football using Markov chains. The author aims to solve the problem of understanding the value of certain game situations, the variability of these values across teams, and the quantification of a player’s contribution to creating good goal-scoring opportunities. The challenges faced by the author includes the difficulty of capturing all the information about the game state, sparse data, and the problem of dividing up credit.

Markov Chains

The author proposes the use of Markov chains to model the likely outcomes after a number of iterations based on the probabilities of transitioning from one state to another. Markov chains allow the author to look at all the possible ways a possession can unfold. The absorption states mean that possessions of arbitrary lengths are handled nicely. However, the downside is that they assume that the current state is independent of the previous state, i.e., it doesn’t matter how they got here, the probabilities of moving to the next state are the same regardless of the past.

Data

The author uses a dataset provided by StatDNA, which includes touch-by-touch data with (x, y) coordinates, event type, defensive pressure, and defensive state. The dataset covers the English Premier League in the 2010/11 season, with 123 matches, a minimum of 11 matches per team, and around 100,000 deliberate actions or about 800 actions per match.

States

The author defined 39 total states:

2 absorbing states: Goal or End of Possession
7 set pieces: Penalty, Short Corner, Long Corner, Short Free Kick, Long Free Kick, Shallow Throw-in, and Deep Throw-in
30 states defined by zonal location and defensive state.

These states help the author to capture the game’s complexity and provide a comprehensive analysis of the tactics and offensive production of individual players.

Zones

Crosses (Pressure A)

Crosses (Pressure B)

Shots

Goals

Important definitions and Transition matrices

The author also defined the terms “deliberate action” and “possession.” A deliberate action is any action where a player moves the ball in a controlled manner with an attempted outcome. On the other hand, possession is a series of consecutive deliberate actions performed by one team, only interrupted by a deliberate action performed by the other team or the end of a half. As en example, a deliberate action can be a pass, shot, dribble, etc. and a not deliberate one a clearance, tackle, etc.

The researchers calculated the probability of moving from one state to another for all combinations of the 39 states using transition matrices. Absorbing states have a different probability, as the probability of remaining in the same state is 1, and moving to another state is 0. Multiplying a transition matrix by itself gives the probability of ending up in a given state after one iteration. The multiplication can be repeated until the probability of ending in an absorbing state converges (n=20).

Validation

To validate the results, the study employs Monte Carlo Bootstrapping and K-Fold Validation techniques. Monte Carlo Bootstrapping generates 1000 samples with replacement, while K-Fold Validation uses 5 folds, with 4 for training and 1 for evaluation.

Results

Comparing P(Goal)

The study compares P(Goal) for each state and presents the results in a table, where the columns are teams ordered by final standing, and the rows are P(Goal) for each state. The study finds that as the teams move lower down the table, they find it harder to score, except for Manchester City (3rd, underperform offensively) and Wolves (17th, overperform offensively).

Set Pieces

The results also showed that different set pieces had varying levels of effectiveness in scoring goals, with penalties having the highest probability of scoring a goal (71.55%), while short free kicks had the lowest (1.08%). Long corners, despite being the second best set piece, it only had a probability of 2.39%.

Corners

Counter Attacks

The study also notes that existing metrics do not take into account the context of the game state, where completed passes such as back-pass, square-pass, and through-ball that put a teammate 1-on-1 with the keeper are weighted equally. Goals are also weighted equally, regardless of how easy they were to score, and missed opportunities could still show up positively in metrics. To address this issue, the study weights each action with incremental improvement of P(Goal).

Incremental improvement of P(Goal)

This section provides information about the performance of three players (Player 1, Player 2, and Player 3) in different states (A, B, and C) and their probability of scoring a goal (P(Goal)) in each state. It also shows the impact of each player on the probability of scoring a goal in the final state, which is denoted as 1.

Player 1 has a P(Goal) of 0.25 in State A. In the final state (Goal Scored), Player 1 has a negative impact of -0.08 on the probability of scoring a goal.

Player 2 has a P(Goal) of 0.17 in State B. In the final state, Player 2 has a positive impact of +0.11 on the probability of scoring a goal.

Player 3 has a P(Goal) of 0.28 in State C. In the final state, Player 3 has a significant positive impact of +0.72 on the probability of scoring a goal.

This information can be used to analyze the performance of each player in different states and their impact on the team’s overall performance.

In this second example, we have Player 1 in State A with a P(Goal) of 0.15. After Player 1 earns a penalty, he has increased the P(Goal) by 0.56. FOllowing this, Player 2 takes a penalty with a P(Goal) of 0.71, and since the state is Penalty Missed with a P(Goal) of 0, Player 2 ends with a contribution of -0.71 to P(Goal).

As it can be observed above, “Player 1 is rewarded for earning the penalty and Player 2 is heavily penalized for missing it.” This shows how the incremental improvement of P(Goal) can be used to weight each action differently based on its importance and impact on the game.

Top and Worst Performing Players

After analyzing the data using the Markov Chain framework, the authors were able to identify top and worst performers in terms of offensive production in the English Premier League during the 2010/11 season. The top performers were Tim Cahill, Yaya Toure, Cesc Fabregas, and Jordan Henderson. Surprisingly, there were also some lesser-known players among the top performers, such as James Morrison, Ricardo Fuller, and Chris Baird.

On the other hand, the worst performers were mainly goalkeepers, but strikers and defenders can also be found. For instance, Darren Bent had a poor performance, with only one goal scored in the sample set out of 17 goals he scored in the entire season. Additionally, he had 19 opportunities with a greater than 10% chance of scoring (22% average), but only managed to convert one of them. The author also identified some players who were poor at crossing the ball, such as Clichy, Young, and Kolarov. Further analysis could help determine which situations resulted in poor performance for these players.

Conclusion

In conclusion, the research paper presents a framework that employs Markov chains to analyze the tactical strategy and individual offensive production assessment in soccer, using a dataset of touch-by-touch data provided by StatDNA. The study defines 39 states and uses transition matrices to calculate the probability of moving from one state to another, validating the results using Monte Carlo Bootstrapping and K-Fold Validation techniques. The study also proposes weighting each action with incremental improvement of P(Goal) to address the limitations of existing metrics.

Be a Team Player — Pass It On!

Rudd, S. (2011, October). A framework for tactical analysis and individual offensive production assessment in soccer using markov chains. In New England symposium on statistics in sports. https://nessis.org/nessis11/rudd.pdf