Training process
Our training follows 2 key phases:- Large-scale supervised fine-tuning: The model learns from labeled data to predict actions accurately.
- Online reinforcement learning (GRPO): The model is further refined through interaction with environments, optimizing performance on real-world tasks.
Benchmarks
Benchmarks provide an objective way of measuring model capabilities. Our Holo models are assessed based on benchmarks in UI localization and screen content understanding.UI Localization
These benchmarks evaluate an agent’s ability to locate elements on a screen (buttons, text boxes, images, etc.) precisely. This is critical for agents performing interactions in GUIs.Holo1.5
Tested on Screenspot-V2, Screenspot-Pro, GroundUI-Web, Showdown, and WebClick.- 7B and 72B models achieve an average 4.5% improvement over prior models.
- 7B and 3B models remain competitive with competing models one class above in weight size, delivering the promise of fast and efficient inference in UI Localization.
Holo1
Evaluated on Screenspot, Screenspot-V2, Screenspot-Pro, GroundUI-Web, and WebClick, demonstrating strong localization capabilities in real-world scenarios.Screen Content Understanding (Holo1.5)
Beyond localization, understanding UI structure and functionality is essential. Holo1.5 was evaluated on GUI-focused QA benchmarks:- ScreenQA Short & Complex
- VisualWebBench
- WebSRC