| Holo3 | Latest generation Vision-Language Model (VLM) for GUI agents that can interact with real digital environments (web, desktop, mobile). |
| Holo2 | Previous generation model that Holo3 improves upon. |
| Vision-Language Model (VLM) | A model that understands both visual inputs (like UI screens) and text, enabling it to interpret interfaces and perform actions. |
| GUI Agents | AI agents that operate graphical user interfaces by observing screens, reasoning about them, and executing actions. |
| Computer Use (CU) | The ability of an AI system to perform tasks on a computer, such as navigating interfaces and executing commands. |
| OSWorld | Benchmark evaluating performance in real Ubuntu desktop environments. |
| WebVoyager / WebArena | Benchmarks for testing web navigation and task completion abilities. |
| AndroidWorld | Benchmark for evaluating performance on mobile environments. |
| Navigation (in AI agents) | The process of completing tasks through multi-step reasoning and actions across interfaces. |
| Policy Learning | Training method where the model learns which actions to take in different situations. |
| Action Grounding | Connecting model decisions to actual executable actions in an environment. |
| Cross-environment Generalization | Ability to perform well across different platforms (web, desktop, mobile), including unseen environments. |
| Holo3-35B-A3B | Smaller model variant, fully open-source under Apache 2.0. |
| Holo3-122B-A10B | Larger model variant, research-only license (non-commercial). |
| Surfer-H | Example next-generation computer-use agent built on Holo3. |
| Qwen/Qwen3.5-35B-A3B | Base model used for fine-tuning Holo3. |
| Supervised Fine-Tuning (SFT) | Training stage where the model learns from labeled examples. |
| Reinforcement Learning (GRPO) | Training method where the model improves through feedback based on its actions. |
| Synthetic Data | Artificially generated data used to supplement training. |
| Human-Annotated Data | Data labeled by humans to improve model accuracy. |
| State-of-the-Art (SOTA) | Performance that is among the best currently available. |