Introduction
Hello everyone! This is Atom from Playbox. Thank you always for your support!
Today, I’d like to share some behind-the-scenes from the article published in the Asahi Shimbun (a major Japanese daily newspaper), in which we analysed the movements of Kaoru Mitoma, the 28-year-old left winger for Brighton & Hove Albion and the Japanese national. This article was created with the support of Rikuhei, a junior colleague from Nagoya University, and Yuki, who is a junior colleague from the University of Tsukuba.
Rikuhei is one of Japan's top football data analysis researchers, and the only Japanese researcher to have presented at the StatsBomb Conference.
Yuki is a highly is a football analyst and a computer vision researcher at the University of Tsukuba. We’ve shared the same undergraduate course, football team, and research lab, but I think he might understand the game on a deeper level than I do.
That was a bit of a long introduction, but I hope you enjoy this as a beginner's guide to understanding the cutting edge of football analysis. This article is the first in a three-part series.
- Behind the Scenes of Football Analytics: Part 1 – How to Extract Match Data from Broadcast Footage
- Behind the Scenes of Football Analytics: Part 2 – Analysing the Ball Carrier (xG, VAEP)
- Behind the Scenes of Football Analytics: Part 3 – Analysing Off-Ball Attacking Patterns (OBSO)
What data is used in football analytics?
Manual Input vs. Sensors vs. AI-based Tracking
In sports, there are roughly three main methods for collecting data:
- Manual Input: Watching footage and manually recording the positions of players and the ball, as well as specific actions.
- Sensors: Equipping players with GPS devices to collect data such as location and speed.
- AI-based Tracking (Video Analysis): Using artificial intelligence to automatically extract information such as player positions, ball position data, and match events from video footage.
Each method has its own strengths and weaknesses, which can be summarised as follows.
Method | Cost | Accuracy | Real-Time Capability | Access to Opponents Data | Available Data Types |
Manual Input | High | Good | No | Yes | Event data only |
Sensors | High | Excellent | Yes | No | Tracking data only |
AI-based Tracking (Video Analysis) | Medium | Good–Excellent | Limited–Moderate | Yes | Tracking data + Event data |
With Al-based tracking (Video Analysis), There is a trade-off between accuracy and processing time: improving accuracy tends to increase processing time, while faster processing may reduce accuracy. One of its major advantages, however, is that it allows analysts to obtain both tracking and event data, including information on opposing teams. Given that video analysts often handle the filming process themselves, this approach tends to be the most flexible and practical in real-world settings. Accuracy is also improving year by year, which is why Playbox are focusing heavily on AI-based Tracking.
Sensor-based tracking systems, such as GPS with built-in accelerometers, have also improved significantly, errors are now typically within one meter. While combining them with AI-based Tracking offers an ideal setup, sensor systems cannot capture data on the opposing team, and the need to attach and manage physical devices adds unexpected complexity and cost. They also lack event data like passes or shots, limiting their analytical value in some contexts.
Types of Video Footage Available for Football Analysis
There are three main types of video footage commonly used in football analysis:
- Broadcast Footage: This includes footage from televised matches, which is widely available to the public. While frequent camera switches and zoom-ins can make analysis challenging, the ease of access is a major advantage.
- Fixed Wide-Angle Camera Footage (used primarily by Playbox): This method uses a simple camera setup to continuously capture the entire pitch. The footage is stable and easy to analyse. The emphasis on capturing the whole pitch, however, comes at the expense of the resolution needed for detailed analysis of individual players — for example, it can be difficult to identify shirt numbers or subtle movements.
- Drone Footage: Captured from above, this footage is ideal for tactical analysis due to its wide coverage. However, it comes with high costs and legal restrictions.
Types of Data That Can Be Extracted from Video Footage
There are two main types of data that can be obtained from video footage:
- Tracking Data: Includes information such as player and ball positions, movement speed, and distance covered.
- Event Data: Refers to specific in-game actions such as passes, shots, and dribbles.
By combining these two types of data, video analysis can be used for a wide range of purposes — from detailed tactical analysis to evaluating individual player performance.
Our Reason for Choosing Broadcast Footage in This Analysis
The match analysed in this article — a World Cup qualifier played by the Japan national team — was not attended in person, so the only available match footage was from the TV broadcast. We confirmed with a newspaper company that analysing and writing an article based on this type of public footage poses no problem. However, whether it is legally acceptable to extract data from such footage and use it for commercial purposes remains unclear.
At Playbox, our standard approach is to receive footage directly from sports teams, perform analysis on it, and then provide results. Because of this, we have not yet encountered this particular issue firsthand.
Recently, though, we’ve received inquiries from content creators who say, for example, “Since we can’t stream actual match footage on YouTube, we’d like to recreate the match using animation based on data extracted from the video.” This raises some legal grey areas, and we plan to consult a solicitor soon to clarify the situation.
Incidentally, when we asked ChatGPT about this, it responded:
“If the footage is copyrighted, any data or animation derived from it may be considered a ‘derivative work.’ Therefore, for commercial use, it’s advisable to obtain permission from the copyright holder. That said, if the extracted data is judged to be ‘factual information without originality,’ permission may not be required. The decision ultimately depends on the specific case, so consulting a legal expert is recommended.”
Makes sense. We’re going to consult a solicitor shortly.
In the next section, we’ll explain in detail how data is extracted from broadcast footage.
2. Extracting Match Data from Broadcast Footage
Normally, acquiring player tracking data involves installing multiple specialised cameras around the stadium or pre-registering the players' kit colours and shirt numbers in the system — steps that can be both time-consuming and technically demanding.
However, in cases like this, where we only have access to TV broadcast footage after the match, none of that prior setup is possible. In other words, the key challenge lies in how accurately AI can track the players and the ball using only the broadcast footage itself.
Several companies around the world have already achieved this. For instance, a company called SkillCorner appears to have developed technology capable of recreating match situations in real time using only TV broadcast footage.
They're undoubtedly a strong competitor — but I couldn’t help sharing their work, because it’s seriously impressive. At Playbox, we’re aiming to achieve something similar in the near future, so watch this space!
Now then, this kind of technology — extracting information such as who is where using only TV broadcast footage — is known as Game State Reconstruction (GSR). Meanwhile, automatically detecting what is happening (such as a pass or a shot) is called Action Spotting, and when it's specifically focused on the ball, it's referred to as Ball Action Spotting (BAS).
Game State Reconstruction (GSR)
GSR (Game State Reconstruction) is a technology that extracts the positional data of players and the ball from footage, and uses it to recreate the state of a match.
- Paper Breakdown: Technologies Featured in the SoccerNet GSR Paper
- Paper Breakdown: SoccerNet Game State Reconstruction
The current state-of-the-art (SOTA) approach has been outlined in a paper published in 2025, titled “From Broadcast to Minimap: Unifying Detection, Tracking, and Calibration for Real-Time Game State Reconstruction”. The method combines open-source AI tools such as YOLO-v5m, SegFormer, and DeepSORT with a proprietary dataset (unfortunately not publicly available), and claims to reconstruct game states at near real-time speed.
That said, this article won’t go into the technical details of that approach. Why? Because Playbox has developed its own, even more advanced GSR method — one that achieves similar levels of accuracy using a fully open dataset! We'll be sharing a dedicated deep-dive into our method in a future article, so stay tuned.
For now, this article will simply give you a basic overview of what GSR is and how it works.
Here’s a rough breakdown of how it works:
- The AI first detects and tracks each player and the ball from the TV broadcast footage (as shown in the bottom-left of the image, where players and the ball are identified).
- It then detects pitch lines and key reference points, and uses this information to convert the player positions in the footage into pitch-based absolute spatial coordinates (as shown in the top-left of the image, where camera calibration is applied).
- By combining these two steps, the system can accurately map the positions of players within the video onto a full-pitch visualisation (as shown on the right-hand side of the image).
In the example shown above, the system first extracts player position data from the broadcast footage (bottom-left), then estimates their pitch coordinates through calibration (top-left), and finally visualises the data as a full-pitch top-down map (right-hand side).
In some cases, it can also recognise shirt numbers, player roles, and team affiliation — but this is the basic mechanism behind GSR.
Ball Action Spotting (BAS)
If GSR is the technology for reconstructing the positions of players and the ball, then Ball Action Spotting (BAS) refers to the detection of events involving the ball — such as passes, shots, and dribbles — directly from the video footage.
This BAS task is also included in the SoccerNet Challenge, and the current state-of-the-art approach is a method called T-DEED, published in 2024. According to the paper, T-DEED can detect 12 types of ball-related events from video alone with an impressive 73.4% accuracy — all within a time window of less than one second. (You can find more details in the T-DEED paper.)
Playbox is actively developing event detection systems like BAS as part of our main product. These systems are primarily designed to work with full-pitch footage, though, so they’re not yet ready to be applied directly to TV broadcast footage like the kind we used here. That’s something we’ll definitely be sharing more about in the future.
By combining technologies like GSR and BAS, we can bring together the key elements needed for deeper football analysis: who did what, where on the pitch — unlocking far more advanced insights into the game.
3. Challenges in Data Extraction and How Playbox Tackles Them
The Limits of AI-Driven Data Extraction – How Much Can We Really Automate?
Technologies like GSR and BAS are constantly evolving, but there are still several challenges that remain difficult to overcome — especially when working with TV broadcast footage. Some of the most common issues include:
- Frequent camera switchs
- Changes in resolution and framing
- Player occlusion
- Locating the ball in mid-air
Broadcasts often switch camera angles rapidly, making it hard to consistently track the same player across the footage.
When players suddenly move out of frame, or the view shifts due to zooms or pans, it's difficult to keep track of players and the ball with precision.
When players overlap with one another, or when referees and staff obstruct the view, it becomes harder for AI to correctly detect individual players.
While players are usually grounded, the ball often travels through the air — and estimating its position in 3D space from a 2D video feed is no easy task.
In fact, even in the short clips we used for this newspaper article, the AI wasn’t able to fully recognise all player positions or shirt numbers. We ended up manually verifying and correcting parts of the data — and as for event detection, we did that entirely by hand.
That said, if people are working on AGI or fully autonomous driving, then surely turning football footage into structured data isn’t that impossible. Compared to those challenges, ours should be a walk in the park — and we’re determined to make it happen.
Legal Considerations and the Challenges of Using Broadcast Footage
As mentioned earlier, beyond the technical hurdles, there are also legal issues that come with using broadcast footage for data extraction.
For example, it's still unclear how copyright laws apply when data extracted from video footage is turned into animations or visualisations. At this stage, we don’t have a definitive answer — but we’re planning to consult with a legal expert (solicitor) very soon. Once we’ve clarified the situation, we’ll be sure to share what we learn here.
In the meantime, if anyone reading this has expertise in this area, we’d love to hear from you — please don’t hesitate to get in touch!
Conclusion — In Football Analysis, Data Collection Is Just the Starting Line
In this article, we’ve walked through how football data can be extracted from video footage — along with some of the key challenges involved, and possible approaches for addressing them.
In football analysis, everything starts with extracting good data. That’s the foundation. At Playbox, our mission is to keep pushing forward the automation of data collection, so that more teams and players — at all levels — can access high-quality analysis without barriers.
We hope this article has helped deepen your understanding of football data, or sparked your curiosity to learn more!
In the next part of this series, we’ll show how the data we’ve collected can actually be used in real analysis — diving into concepts like xG (expected goals) and VAEP (Valuing Actions by Estimating Probabilities) to break down performance in more detail.
株式会社Playboxのホームページ 👉️ https://www.play-box.ai/
自動撮影・編集ができる手頃なAIスポーツカメラ「playbox」 👉️ https://www.play-box.ai/lp
お問い合わせ
Playboxへのご質問やご相談、ビジネスの提案など、お気軽に以下のメールアドレスまでご連絡ください。