How We’ve Created a Robust AI for Our Sports Augmented Reality (AR) Experience ARISE in-Stadium
31/03/2021
How We’ve Created a Robust AI for Our Sports Augmented Reality (AR) Experience ARISE in-Stadium
31/03/2021

Discover here a very interesting article made by our Data Lab and focusing on the work done to create a robust deep learning model allowing sports fans to enjoy a real-time augmented reality experience in stadiums! Artificial Intelligence (AI) plays a significant role in the growth of Immersiv.io’s products. It is critical for us to provide our users with a fast and simple-to-use experience at their fingertips, using the most cutting-edge AI technologies.
Artificial Intelligence in the sports field: a never-ending story
AI has taken a prominent role in today’s culture as a result of recent technological developments. We have a plethora of technical advancement and task automation options at our disposal.
Computer Vision is one of the most powerful and interesting branches of AI, you’ve probably encountered it in a number of ways without even realizing it. Machines have become quicker and more effective than humans at detecting and identifying objects in recent years. The global computer vision market is projected to hit USD 19.1 billion by 2027, making it a booming industry in AI.
These technical advancements are not new to the sports industry. Computer vision is used in a variety of fields, including sports performance analysis through athlete identification and tracking, digital advertisement, and fan engagement. When we consider the rising democratization of Mixed Reality, the fan experience takes on a whole new sense. The possibilities are infinite, whether for in-stadium experiences that are seriously lacking in visual content or for at-home experiences for fans looking for an immersive feeling from the comfort of their sofa.
ARISE, the AR experience for sports fans
ARISE is an award-winning solution developed by Immersiv.io. It allows stadium fans to improve their experience across different sports by using Augmented Reality content. This technology allows for a direct and real-time display of players’ statistics or more general game data directly on the pitch. From a machine learning standpoint, we will address the work done by our Data Lab to create the most robust AR cross-platform experience for sports fans in this article.
ARISE: When data science meets augmented reality to create a never-seen-before fan experience
As previously mentioned, ARISE provides real-time access to information about what is happening in the game using AR content. To accomplish this, a Deep Learning model that meets performance and accuracy requirements must be designed.
The user experience must be as simple and ergonomic as possible. The user clicks directly on the pitch through a phone or smart-glasses to access the content. The user is then presented with a variety of options, including accessing the instantaneous speed of a particular player, displaying a heat-map, viewing the ball carrier’s passing options, and so on. It helps the user to gain a deeper understanding of the game by showing what is happening on the field and providing access to teams’ & players’ statistical insights. Interacting with others via social media is also possible, such as sharing replays from the stadium stands or reacting to other people’s posts.

This leads to some technological considerations. First and foremost, by using a single picture taken by the user, we must be able to locate the user in a three-dimensional space. In fact, this is a critical step since our algorithm uses the user’s exact location in the stands to estimate his relative position to the playing field.
Second, we must consider the product’s temporal aspect. It is necessary to decrease latency for a seamless experience so that the user receives statistics in real-time which can be displayed on the pitch without any visual lag.
We also had to adhere to two restrictions. The first is the devices that our product will work on. Smart-glasses are becoming increasingly common, with major players such as Apple and Facebook planning to join the market soon. As a result, it was crucial for us to foresee this change and create a tool that works on both smart-glasses and smartphones. The second restriction was the application area, which had to be as broad as possible, i.e. applicable to a variety of sports. Our product is currently available for football (soccer), basketball, tennis, and ice hockey.
These questions have formed the basis of our reflection to design and deploy a robust approach.
Challenges: How to create an accurate deep learning model working for every fan in the stadium
To put it all together, our data science team had to create a deep learning model capable of pinpointing the exact position of the user in the stadium stands, as well as his perspective in regards to what’s happening on the field using only one picture and embedded sensors on the user’s device. This is a challenging task in and of itself, but what if we want to take it a step further? In order to do so, our team has set the following objectives:
- Scalability: Our deep learning model should be able to provide the same accuracy if it’s day or night, sunny or gloomy, in Munich’s Allianz Arena or Manchester’s Etihad Stadium, without having to use new data or train a new model for each stadium.
- Cross-platform: Since different smartphones have different sensors with different accuracies, our model should be able to provide the same experience for each device, as well as for Augmented Reality glasses such as Nreal and Magic Leap, which have their own set of requirements.

- Versatility: ARISE is a multi-sport experience; our team has been able to design a deep architecture that allows us to train models that can be used in many sports; the only difference is the data labeling. This was accomplished through dynamic training, which enabled us to adjust the parameters of our architecture for better learning.
- Real-time: ARISE is a live experience, what you see on the field translated to real-time statistics on your device with small to no latency, a multitude of breakthrough technologies are employed by our Dev Lab including 5G Edge Computing. Our Data Science team is no exception, having models capable of providing real-time results is a must. Cloud and Edge (in-device) computing are two approaches that have been developed and heavily optimized. Both approaches have different advantages and disadvantages, for instance, running a deep learning model on Edge (on the device itself) is ideal for low latency problems like ours due to relocating data processing to the device itself thus eliminating data transfer, however, what Cloud computing lacks in speed makes up to in power and capacity, allowing for large-scale data analysis.
Creating a Fast & Accurate Model
At the time of writing this article, our model is accurate on new stadiums with little to no prior knowledge of a given new stadium, as well as on various real-world conditions. It’s also capable of pinpointing the pitch-to-user’s exact location with less than 30 cm precision across different sports which currently include Football (Soccer), Tennis, Ice Hockey, and Basketball. ARISE also works on Android and iOS Smartphones as well as Augmented Reality glasses: Microsoft’s Hololens, Nreal, and Magic Leap for now.
Finally, our Cloud-based model runs in real-time returning results at approximately 34 milliseconds per query, allowing us to reach ~29 fps (frame/image per second). Our Edge-based model on the other hand returns results at approximately 11 milliseconds allowing us to reach 90 fps using iOS’ Neural Network chipset ANE (Artificial Neural Engine), however, comes with the disadvantage of accumulating data locally to be sent periodically to our servers for big data analysis, as well as accounting the model’s size.

Inference time per experiment on Edge vs Cloud
So, how did we do it? To be accurate, our model must be able to withstand a number of real-world conditions, such as changing lighting, pitch patterns, ambient stadium colors, and partially obstructed field of view caused by other spectators’ presence, to name a few. It’s virtually impossible to reflect every stadium, let alone every situation, with a small dataset since it necessitates defining each scenario, collecting data from a large number of stadiums, annotating, and processing all of the data, which is neither cost-effective nor convenient. As a result, more effort was needed in data processing and model architecture engineering.
We began by designing a data processing pipeline that allows us to annotate the bare minimum of data without compromising data quality, the processed dataset was then augmented to take into account the different possible issues that might arise, these issues were identified using automatic False Positives and False Negatives identification through multiple tests, which were then analyzed to generate similar synthetic data to be added to our data set.
This pipeline takes our approach a step further by allowing on-the-fly data preparation, meaning that the data is generated using multi-threading on the CPU (Central Processing Unit) while the model is being trained on the GPU (Graphics Processing Unit) at the same time, which leads to faster training while also minimizing the use of the RAM (Random-Access Memory). Since training deep learning models is considered to be a time-consuming process, reducing training time is important for rapidly testing multiple techniques.
Furthermore, we’ve updated our deep learning model’s architecture to reduce the over-fitting on our small data-set, allowing us to reduce the variance of our model just enough to make good predictions on the validation set as well as have a better generalization over images taken from other distributions. Dropout layers, which can be thought of as arbitrarily obscuring features that our model learned, are one of the techniques used. This way, our model won’t be reliant on replicated features in our data-set that aren’t visible on other distributions, such as identical team emblems appearing in the stadium.


Dice coefficient’ evolution on training data
However, when we’re dealing with real-time prediction, enhancing the model’s accuracy isn’t the only aim to achieve; there’s a trade-off between prediction time and prediction accuracy that must be met. An in-depth framework-level optimization was made with the goal of achieving a quick prediction while maintaining our high prediction accuracy on both Cloud and Edge. This included updating CoreML’s code (an open-source deep-learning framework) to take into account our custom-made architecture, enabling it to use the full potential of hardware acceleration on Smartphones and AR/MR smart glasses. Which allowed us to achieve the same efficiency as our cloud-based solution while removing network latency.

Creating an AI that is scalable across different sports, robust to real-life conditions, accurate on unseen data, and cross-platform (smartphones and AR/MR smart glasses), all with minimal changes, proved to be a challenging task that our team has proudly accomplished. But it’s only one aspect of a much larger set of tasks that includes real-time logging and tracking, cloud-level optimization to minimize latency, and making full use of the user’s devices’ capabilities, such as embedded sensors, but you’ll discover more on that in later articles.

This article was written by Immersiv.io’s Data Lab, composed of Data Scientists experimenting and reinventing machine learning models to create the most reliable and innovative immersive experiences for sports fans.
Special thanks to Sakher and Alexandre!