Behind the Scenes of Football Analytics: Part 1 - How to Extract Match Data from TV Broadcast Footage

Written by

Atom Scott

Date published

May 27, 2025

Introduction

Hello everyone! This is Atom from Playbox. Thank you always for your support!

Today, I’d like to share some behind-the-scenes from the article published in the Asahi Shimbun (a major Japanese daily newspaper), in which we analysed the movements of Kaoru Mitoma, the 28-year-old left winger for Brighton & Hove Albion and the Japanese national. This article was created with the support of Rikuhei, a junior colleague from Nagoya University, and Yuki, who is a junior colleague from the University of Tsukuba.

三笘薫の動きは効果的か？　筑波大蹴球部の仲間が起業、AIで分析：朝日新聞

データ分析の進化が続くサッカー界で、人工知能（AI）を使った新たな挑戦をしているベンチャー企業がある。立ち上げたのは、日本代表の三笘薫（ブライトン）が育った筑波大蹴球部の同期や後輩たち。昨年から始ま…

www.asahi.com

Rikuhei is one of Japan's top football data analysis researchers, and the only Japanese researcher to have presented at the StatsBomb Conference.

Evaluation of Team Defence Positioning: Umemoto Rikuhei | StatsBomb Conference 2023

This research was delivered and produced by Umemoto Rikuhei and Keisuke Fujii, who are both part of the Defense Evaluation Team at Nagoya University. This talk was part of a research competition held by StatsBomb for the annual StatsBomb Conference. This year, participants of the competition were selected based on the theme: Identifying Team Tactical Style of Play To read the accompanying research papers to the talks presented, please visit: https://statsbomb.com/news/statsbomb-conference-2023-research-papers/ The StatsBomb Conference is an annual event that hosts two full tracks of content including moderated panels, sessions, and keynote speakers from some of the world's most forward-thinking sports organisations. The research stage contains presentations using exclusive StatsBomb data from industry experts in analytics from across the sporting landscape. Find out more about the StatsBomb Conference: https://statsbomb.com/events/statsbomb-conference-2023/ Follow StatsBomb: https://twitter.com/StatsBomb https://linkedin.com/company/statsbomb

www.youtube.com

Evaluation of Team Defence Positioning: Umemoto Rikuhei | StatsBomb Conference 2023

Yuki is a highly is a football analyst and a computer vision researcher at the University of Tsukuba. We’ve shared the same undergraduate course, football team, and research lab, but I think he might understand the game on a deeper level than I do.

筑波大が養成する「アナリスト」が勝負をひっくり返す…J1町田を破り、柏と接戦を演じた舞台裏に迫った：東京新聞デジタル

サッカー天皇杯全日本選手権で、筑波大がITを駆使して対戦相手の攻略法を探り、格上のJ1クラブと互角に渡り合った。デジタル技術で分析する...

www.tokyo-np.co.jp

筑波大が養成する「アナリスト」が勝負をひっくり返す…J1町田を破り、柏と接戦を演じた舞台裏に迫った：東京新聞デジタル

That was a bit of a long introduction, but I hope you enjoy this as a beginner's guide to understanding the cutting edge of football analysis. This article is the first in a three-part series.

Behind the Scenes of Football Analytics: Part 1 – How to Extract Match Data from Broadcast Footage
Behind the Scenes of Football Analytics: Part 2 – Analysing the Ball Carrier (xG, VAEP)
Behind the Scenes of Football Analytics: Part 3 – Analysing Off-Ball Attacking Patterns (OBSO)

What data is used in football analytics?

Manual Input vs. Sensors vs. AI-based Tracking

In sports, there are roughly three main methods for collecting data:

Manual Input: Watching footage and manually recording the positions of players and the ball, as well as specific actions.
Sensors: Equipping players with GPS devices to collect data such as location and speed.
AI-based Tracking (Video Analysis): Using artificial intelligence to automatically extract information such as player positions, ball position data, and match events from video footage.

Each method has its own strengths and weaknesses, which can be summarised as follows.

Method	Cost	Accuracy	Real-Time Capability	Access to Opponents Data	Available Data Types
Manual Input	High	Good	No	Yes	Event data only
Sensors	High	Excellent	Yes	No	Tracking data only
AI-based Tracking (Video Analysis)	Medium	Good–Excellent	Limited–Moderate	Yes	Tracking data + Event data

With Al-based tracking (Video Analysis), There is a trade-off between accuracy and processing time: improving accuracy tends to increase processing time, while faster processing may reduce accuracy. One of its major advantages, however, is that it allows analysts to obtain both tracking and event data, including information on opposing teams. Given that video analysts often handle the filming process themselves, this approach tends to be the most flexible and practical in real-world settings. Accuracy is also improving year by year, which is why Playbox are focusing heavily on AI-based Tracking.

Sensor-based tracking systems, such as GPS with built-in accelerometers, have also improved significantly, errors are now typically within one meter. While combining them with AI-based Tracking offers an ideal setup, sensor systems cannot capture data on the opposing team, and the need to attach and manage physical devices adds unexpected complexity and cost. They also lack event data like passes or shots, limiting their analytical value in some contexts.

Types of Video Footage Available for Football Analysis

There are three main types of video footage commonly used in football analysis:

Broadcast Footage: This includes footage from televised matches, which is widely available to the public. While frequent camera switches and zoom-ins can make analysis challenging, the ease of access is a major advantage.

Screenshot from a Premier League match. Source: this post.

Fixed Wide-Angle Camera Footage (used primarily by Playbox): This method uses a simple camera setup to continuously capture the entire pitch. The footage is stable and easy to analyse. The emphasis on capturing the whole pitch, however, comes at the expense of the resolution needed for detailed analysis of individual players — for example, it can be difficult to identify shirt numbers or subtle movements.

An example of full-pitch footage captured by a fixed camera device from Playbox. For more information, visit here.

Drone Footage: Captured from above, this footage is ideal for tactical analysis due to its wide coverage. However, it comes with high costs and legal restrictions.

As part of my master’s research, I created a dataset using drone footage recorded from above the pitch. You can see the players’ positions, but it’s hard to tell who is who. This dataset is available for download on Kaggle.

Types of Data That Can Be Extracted from Video Footage

There are two main types of data that can be obtained from video footage:

Tracking Data: Includes information such as player and ball positions, movement speed, and distance covered.
Event Data: Refers to specific in-game actions such as passes, shots, and dribbles.

By combining these two types of data, video analysis can be used for a wide range of purposes — from detailed tactical analysis to evaluating individual player performance.

Our Reason for Choosing Broadcast Footage in This Analysis

The match analysed in this article — a World Cup qualifier played by the Japan national team — was not attended in person, so the only available match footage was from the TV broadcast. We confirmed with a newspaper company that analysing and writing an article based on this type of public footage poses no problem. However, whether it is legally acceptable to extract data from such footage and use it for commercial purposes remains unclear.

At Playbox, our standard approach is to receive footage directly from sports teams, perform analysis on it, and then provide results. Because of this, we have not yet encountered this particular issue firsthand.

Recently, though, we’ve received inquiries from content creators who say, for example, “Since we can’t stream actual match footage on YouTube, we’d like to recreate the match using animation based on data extracted from the video.” This raises some legal grey areas, and we plan to consult a solicitor soon to clarify the situation.

Incidentally, when we asked ChatGPT about this, it responded:

“If the footage is copyrighted, any data or animation derived from it may be considered a ‘derivative work.’ Therefore, for commercial use, it’s advisable to obtain permission from the copyright holder. That said, if the extracted data is judged to be ‘factual information without originality,’ permission may not be required. The decision ultimately depends on the specific case, so consulting a legal expert is recommended.”

Makes sense. We’re going to consult a solicitor shortly.

In the next section, we’ll explain in detail how data is extracted from broadcast footage.

2. Extracting Match Data from Broadcast Footage

Normally, acquiring player tracking data involves installing multiple specialised cameras around the stadium or pre-registering the players' kit colours and shirt numbers in the system — steps that can be both time-consuming and technically demanding.

However, in cases like this, where we only have access to TV broadcast footage after the match, none of that prior setup is possible. In other words, the key challenge lies in how accurately AI can track the players and the ball using only the broadcast footage itself.

Several companies around the world have already achieved this. For instance, a company called SkillCorner appears to have developed technology capable of recreating match situations in real time using only TV broadcast footage.

They're undoubtedly a strong competitor — but I couldn’t help sharing their work, because it’s seriously impressive. At Playbox, we’re aiming to achieve something similar in the near future, so watch this space!

Now then, this kind of technology — extracting information such as who is where using only TV broadcast footage — is known as Game State Reconstruction (GSR). Meanwhile, automatically detecting what is happening (such as a pass or a shot) is called Action Spotting, and when it's specifically focused on the ball, it's referred to as Ball Action Spotting (BAS).

Game State Reconstruction (GSR)

GSR (Game State Reconstruction) is a technology that extracts the positional data of players and the ball from footage, and uses it to recreate the state of a match.

The current state-of-the-art (SOTA) approach has been outlined in a paper published in 2025, titled “From Broadcast to Minimap: Unifying Detection, Tracking, and Calibration for Real-Time Game State Reconstruction”. The method combines open-source AI tools such as YOLO-v5m, SegFormer, and DeepSORT with a proprietary dataset (unfortunately not publicly available), and claims to reconstruct game states at near real-time speed.

That said, this article won’t go into the technical details of that approach. Why? Because Playbox has developed its own, even more advanced GSR method — one that achieves similar levels of accuracy using a fully open dataset! We'll be sharing a dedicated deep-dive into our method in a future article, so stay tuned.

For now, this article will simply give you a basic overview of what GSR is and how it works.

Here’s a rough breakdown of how it works:

The AI first detects and tracks each player and the ball from the TV broadcast footage (as shown in the bottom-left of the image, where players and the ball are identified).
It then detects pitch lines and key reference points, and uses this information to convert the player positions in the footage into pitch-based absolute spatial coordinates (as shown in the top-left of the image, where camera calibration is applied).
By combining these two steps, the system can accurately map the positions of players within the video onto a full-pitch visualisation (as shown on the right-hand side of the image).

In the example shown above, the system first extracts player position data from the broadcast footage (bottom-left), then estimates their pitch coordinates through calibration (top-left), and finally visualises the data as a full-pitch top-down map (right-hand side).

In some cases, it can also recognise shirt numbers, player roles, and team affiliation — but this is the basic mechanism behind GSR.

Ball Action Spotting (BAS)

If GSR is the technology for reconstructing the positions of players and the ball, then Ball Action Spotting (BAS) refers to the detection of events involving the ball — such as passes, shots, and dribbles — directly from the video footage.

This BAS task is also included in the SoccerNet Challenge, and the current state-of-the-art approach is a method called T-DEED, published in 2024. According to the paper, T-DEED can detect 12 types of ball-related events from video alone with an impressive 73.4% accuracy — all within a time window of less than one second. (You can find more details in the T-DEED paper.)

Playbox is actively developing event detection systems like BAS as part of our main product. These systems are primarily designed to work with full-pitch footage, though, so they’re not yet ready to be applied directly to TV broadcast footage like the kind we used here. That’s something we’ll definitely be sharing more about in the future.

By combining technologies like GSR and BAS, we can bring together the key elements needed for deeper football analysis: who did what, where on the pitch — unlocking far more advanced insights into the game.

3. Challenges in Data Extraction and How Playbox Tackles Them

The Limits of AI-Driven Data Extraction – How Much Can We Really Automate?

Technologies like GSR and BAS are constantly evolving, but there are still several challenges that remain difficult to overcome — especially when working with TV broadcast footage. Some of the most common issues include:

Frequent camera switchs

Broadcasts often switch camera angles rapidly, making it hard to consistently track the same player across the footage.

Changes in resolution and framing

When players suddenly move out of frame, or the view shifts due to zooms or pans, it's difficult to keep track of players and the ball with precision.

Player occlusion

When players overlap with one another, or when referees and staff obstruct the view, it becomes harder for AI to correctly detect individual players.

Locating the ball in mid-air

While players are usually grounded, the ball often travels through the air — and estimating its position in 3D space from a 2D video feed is no easy task.

In fact, even in the short clips we used for this newspaper article, the AI wasn’t able to fully recognise all player positions or shirt numbers. We ended up manually verifying and correcting parts of the data — and as for event detection, we did that entirely by hand.

That said, if people are working on AGI or fully autonomous driving, then surely turning football footage into structured data isn’t that impossible. Compared to those challenges, ours should be a walk in the park — and we’re determined to make it happen.

Legal Considerations and the Challenges of Using Broadcast Footage

As mentioned earlier, beyond the technical hurdles, there are also legal issues that come with using broadcast footage for data extraction.

For example, it's still unclear how copyright laws apply when data extracted from video footage is turned into animations or visualisations. At this stage, we don’t have a definitive answer — but we’re planning to consult with a legal expert (solicitor) very soon. Once we’ve clarified the situation, we’ll be sure to share what we learn here.

In the meantime, if anyone reading this has expertise in this area, we’d love to hear from you — please don’t hesitate to get in touch!

Conclusion — In Football Analysis, Data Collection Is Just the Starting Line

In this article, we’ve walked through how football data can be extracted from video footage — along with some of the key challenges involved, and possible approaches for addressing them.

In football analysis, everything starts with extracting good data. That’s the foundation. At Playbox, our mission is to keep pushing forward the automation of data collection, so that more teams and players — at all levels — can access high-quality analysis without barriers.

We hope this article has helped deepen your understanding of football data, or sparked your curiosity to learn more!

In the next part of this series, we’ll show how the data we’ve collected can actually be used in real analysis — diving into concepts like xG (expected goals) and VAEP (Valuing Actions by Estimating Probabilities) to break down performance in more detail.