How To Identify Tactical Formations in Football?
A Review of Formation Identification Principles and Limitations in Soccer Analytics
The following summary critically reviews the research conducted by Hadi Sotudeh, titled "The principles of tactical formation identification in association football (soccer) — a survey". All data, figures, and analysis presented here are drawn from their original work; I do not claim any authorship or ownership of the content. This summary has been written to provide a concise and technically informed synthesis of the paper’s findings, methodologies, and implications, while maintaining fidelity to the authors’ intellectual contributions.
Introduction
Formations in football, understood as structured spatial arrangements of players, embody a long-standing principle of organized collective behavior—one that transcends military history and biological systems to influence fields as diverse as robotics, gaming, and team sports. Within football specifically, the evolution of formations—from early, attack-heavy patterns like 2-2-6 and 2-3-5 to balanced systems such as 4-2-3-1—has consistently responded to regulatory changes and tactical demands, highlighting the dynamic, non-optimal nature of any single system [13, 14].
The operational definition adopted in this work sees a "formation" as a semantic abstraction of team structure, denoted typically via numeric labels (e.g., 4-4-2), summarizing outfield player distributions across horizontal lines [16]. Despite widespread use, these labels are not standardized, and they often assume symmetric structures, although real-match positioning can deviate significantly from this convention.
Formational fluidity during matches, driven by scorelines, coaching interventions, player substitutions, or tactical transitions [18–26], further complicates static classification approaches, which are still common in media portrayals and historical archives [11, 28]. Nevertheless, formations serve critical functions: enhancing coordination, simplifying communication, conserving energy, and reinforcing tactical consistency. They provide “a reference for players to remember their organization and responsibilities when distracted” [29], and reduce on-pitch confusion [30].
The choice of formation is also a foundational element of opposition analysis [13, 20], with implications for counter-strategy development. As highlighted in the widely publicized “spygate” case [37], teams seek any advantage in understanding an opponent’s structural setup. Such choices are informed not only by strategic goals but also by player skillsets, contextual constraints (home/away), and institutional traditions [11, 19, 43, 45].
Goal
Despite the strategic centrality of formations in football, current analysis practices remain largely qualitative, relying on “isolated observations” or static snapshots such as “most seen arrangements” or possession-specific patterns [16, 47, 48, 49, 50]. This subjectivity, compounded by inconsistent interpretations, renders formation analysis both time-consuming and imprecise. A striking illustration of this issue is the mere 65% agreement between two major industry data providers regarding starting formations for the 2022 FIFA Men’s World Cup [52, 53], underscoring the “lack of ground truth formation labels” [54].
To address these challenges, recent research has increasingly focused on automating formation recognition using data-driven methods. These include machine learning models that can identify formations in scalable, objective, and reproducible ways. Beyond basic tactical interpretation, such models have utility in recruitment, match planning, and performance analysis—for example, by linking formation choice to “success metrics (e.g., goals, expected goals, scoring zone entries)” [30, 55, 56] or evaluating the physical load implications of different tactical systems [57–59]. They also offer opportunities for real-time applications in coaching or broadcasting.
Recognizing the fragmented and evolving nature of this research space, this paper introduces a structured survey of formation identification methodologies based on event and tracking data. Its aim is to synthesize existing approaches, identify methodological gaps, and provide a foundation for future studies in the field.
Method
This paper provides a comprehensive survey, not as a formal systematic review, but as a broad-scoped overview of the key methodological principles employed over the past decades for identifying football formations using both event and tracking data. The review is built upon foundational literature—academic publications, theses, presentations, books, and patents—with an initial emphasis on seminal football-specific studies, expanded through citation tracking and incorporation of relevant methodologies from adjacent fields and other sports.
The survey organizes formation identification methods into a pipeline comprising preprocessing, data representation, formation identification, and evaluation. The process bifurcates into two major paths: team-level and position-level identification. In the team-level approach, the goal is to assign a formation label to the entire team as a whole. Conversely, the position-level approach first infers individual player roles on the pitch and then maps the collective structure to a formation using a predefined template or derived cluster.
Tactical position identification is thus considered an integral part of formation recognition, especially in the position-level strategy. The structure of the pipeline is visualized in Figure 2, and the remaining sections of the paper explore each stage of this pipeline in detail, from input data types to representation schemes, identification logic, and evaluation frameworks.
Data
Event data
Event data captures discrete on-ball actions such as passes, shots, throw-ins, and fouls, annotated with metadata including timestamps, field coordinates, player identities, and outcomes. Modern datasets are now collected via semi-automated systems operated by trained annotators [69, 70]. Within this study, event data is employed primarily for identifying match segments—discrete periods of play significant for tactical interpretation.
Tracking data
Tracking data provides continuous spatio-temporal records of player and ball positions, sampled at high frequencies (e.g., 25 Hz), yielding millions of data points per match [74]. These data are obtained through various technological setups: optical tracking systems installed in stadiums [71], GPS and radar sensors worn by players [72], or deep learning techniques applied to broadcast video [73]. Tracking data underpins all pipeline stages in this study, enabling the analysis of formations in both static snapshots and dynamic sequences.
Preprocessing
Preprocessing aims to standardize input data to enable reliable formation analysis across matches and contexts. Key steps include aligning team orientation—ensuring teams consistently attack in a single direction, typically bottom-to-top. Additionally, goalkeeper positions are often excluded, as they contribute minimally to formation structure in the outfield context. Pitch dimensions, which can vary by stadium, are normalized to a standard template to ensure positional comparability across venues.
Two additional tasks—explored in subsequent subsections—address temporal segmentation of play (match segments) and spatial normalization (location transformation). These procedures ensure that the spatio-temporal representation of formations is coherent across both time and space, regardless of contextual variation.
Match Segments
To account for the dynamic nature of football formations, matches are divided into meaningful temporal units known as phases of play. These segments—commonly classified into possession (attack), non-possession (defense), and transitions (attack-to-defense or defense-to-attack)—enable more accurate capture of tactical structures than static, full-match labels. Nevertheless, the exact definition of a segment is subjective.
Recent approaches vary in granularity. Some studies segment matches into fixed time windows (e.g., 5–15 minutes) or use ball possession to define sequences, often removing moments such as restarts or short interruptions. Others identify subphases like build-up or pressing blocks, using ball zone changes or machine learning methods like CNNs trained on tracking data. Additionally, more advanced techniques apply graph-based change-point detection like using Delaunay adjacency matrices or planarity testing, to identify intervals with homogeneous spatial structure.
Crucially, segmenting the match allows researchers to exclude atypical scenarios like set pieces and focus on tactical continuity. It also enables the detection of sub-formations specific to each phase which would otherwise be obscured in full-match summaries.
Normalization
Normalization aims to standardize player locations so that identical formations are classified consistently, irrespective of their absolute position on the pitch. This is essential for robust formation recognition across different phases, such as defending deep versus attacking high.
One common normalization strategy involves translating player positions (typically by aligning the team's centroid or using k-nearest-neighbor centering) to the pitch center. Additionally, scaling transformations like min-max normalization, range scaling, or standard deviation normalization are applied to counter differences in formation compactness or spread.
These steps are conceptually tied to Procrustes analysis, a classical shape-matching technique originating in biology. However, normalization introduces nontrivial risks: spatially coherent but tactically distinct variations may be misrepresented as different formations due to exaggerated spatial variance. As shown below, while translation and scaling can harmonize formations across zones (e.g., defense and attack), they may fail when formations structurally stretch or deform.

Thus, while normalization can be helpful, its potential distortions imply that similar standardization objectives might be more reliably achieved through robust downstream modeling stages, such as feature representation or clustering.
Team Level
Representation
Effective team-level formation representations must distinguish between different structures, remain invariant to minor player displacements, and consistently reflect identical setups.
The most common method, Average Player Locations [25, 85, 108, 109], computes mean coordinates over a segment, but this can be misleading when players change sides or positions; resulting in central convergence artifacts that distort formation shape. Averaging over shorter time windows helps but does not eliminate this issue.

Hand-engineered Features (such as centroid, spread, convex hulls, or spatial grids) encode spatial distributions but rely heavily on manual feature design. For instance, presence grids (e.g., 5×5 cells) convert spatial occupancy into binary vectors, but their fidelity depends on grid granularity and alignment.

Graph Representations model player connectivity using methods like minimum spanning trees or nearest neighbors [10, 92, 120-126], or Delaunay triangulations[104, 105, 127, 128]. While more expressive and sensitive to spatial topology, these graphs are often non-unique and fragile to slight location shifts, risking inconsistent formation classification. This sensitivity is especially problematic since small changes should not yield structurally different formations.

Despite successful use in other domains, graph-based methods for football formations must address these limitations to ensure robustness and interpretability.
Identification
Formation identification at the team level typically employs two strategies: template matching and clustering.
Template-based methods rely on predefined formations like 4-4-2, matching them via distance metrics (e.g., Euclidean, graph edit) or supervised models (e.g., SVMs, neural networks). However, maintaining a current and consistent template list is challenging due to varying taxonomies and evolving tactics. For instance, only 30% agreement was found among three industry providers on 44 formations, which highlights the subjectivity and inconsistency of existing labels.
Clustering approaches bypass label dependency by learning formations directly from data. Algorithms such as k-means or hierarchical clustering segment players' spatial coordinates to infer the number of horizontal or vertical lines. While clustering is more flexible, it still requires careful calibration of cluster numbers and thresholds to avoid misclassification.
Position Level

The position-level approach defines formations by first identifying individual player roles based on where players spend most time. Unlike player IDs, position labels remain consistent across substitutions or matches. A key constraint is that "no two teammates can occupy the same position simultaneously" [9].

Representation
Beyond raw 2D coordinates, position representations fall into three main categories.
Relative locations describe each player's position in relation to teammates using angle bins or counts of nearby players, capturing spatial role semantics [83].

Distributions, such as heatmaps or bivariate normal distributions [88], model occupancy probabilities over the pitch [83, 165].
Image-based representations convert spatial data into color-coded inputs for image classifiers, enabling deep learning applications [99].
Identification
Position-level identification mirrors the team-level approach, using either templates or clustering. Template methods rely on predefined position maps, applying rule-based region matching [166, 167], similarity measures (e.g., Chi-square, Naive Bayes) [83], or deep learning classifiers like ResNet on spatial images [99]. However, templates still face the same issues of subjectivity and upkeep as at the team level.
Clustering approaches, such as k-means [9, 31, 45, 51, 83, 87, 102, 168–171] GMMs [25, 103, 172–175], or hierarchical clustering [25, 55, 88, 91, 96, 97, 104, 175–179], infer positions directly from tracking data without predefined labels. Cluster numbers are tuned via methods like dendrogram analysis or expert input [55, 87, 88, 105], offering flexibility but requiring extensive match data.
Evaluation
Most formation studies lack comprehensive evaluation due to challenges in defining ground truth and consistent metrics [106, 180]. Quantitative validation is limited, but design-related aspects (such as robustness to input noise, reproducibility, and interpretability) are critical for assessing methodological soundness [181,182]. Qualitative evaluation considers whether outputs behave intuitively and provide useful insights for practitioners [183].
Discussion & Conclusion
Formations, defined as spatial player arrangements, remain conceptually ambiguous and are still mostly analyzed qualitatively. This survey structured the field into preprocessing (e.g., segmentation, normalization), team- or position-level approaches, and representation (e.g., averages, graphs) followed by identification via templates or clustering.
While average locations are common, they risk misinterpretation; graph-based representations offer promise but require careful design. Clustering avoids label subjectivity but demands large datasets. As the author suggests a stronger consensus on position labels than formations, the position-level path may be more robust. Future work should integrate contextual match dynamics and scale to longitudinal datasets for broader insights.
Learn More
My Recommended Books
References
Sotudeh, H. (2025). The principles of tactical formation identification in association football (soccer)—a survey. Frontiers in Sports and Active Living, 6, 1512386. https://doi.org/10.3389/fspor.2024.1512386
To keep this article concise, please refer to the original paper for the full list of references.