Skip to main content
TermDefinition
Holo3Latest generation Vision-Language Model (VLM) for GUI agents that can interact with real digital environments (web, desktop, mobile).
Holo2Previous generation model that Holo3 improves upon.
Vision-Language Model (VLM)A model that understands both visual inputs (like UI screens) and text, enabling it to interpret interfaces and perform actions.
GUI AgentsAI agents that operate graphical user interfaces by observing screens, reasoning about them, and executing actions.
Computer Use (CU)The ability of an AI system to perform tasks on a computer, such as navigating interfaces and executing commands.
OSWorldBenchmark evaluating performance in real Ubuntu desktop environments.
WebVoyager / WebArenaBenchmarks for testing web navigation and task completion abilities.
AndroidWorldBenchmark for evaluating performance on mobile environments.
Navigation (in AI agents)The process of completing tasks through multi-step reasoning and actions across interfaces.
Policy LearningTraining method where the model learns which actions to take in different situations.
Action GroundingConnecting model decisions to actual executable actions in an environment.
Cross-environment GeneralizationAbility to perform well across different platforms (web, desktop, mobile), including unseen environments.
Holo3-35B-A3BSmaller model variant, fully open-source under Apache 2.0.
Holo3-122B-A10BLarger model variant, research-only license (non-commercial).
Surfer-HExample next-generation computer-use agent built on Holo3.
Qwen/Qwen3.5-35B-A3BBase model used for fine-tuning Holo3.
Supervised Fine-Tuning (SFT)Training stage where the model learns from labeled examples.
Reinforcement Learning (GRPO)Training method where the model improves through feedback based on its actions.
Synthetic DataArtificially generated data used to supplement training.
Human-Annotated DataData labeled by humans to improve model accuracy.
State-of-the-Art (SOTA)Performance that is among the best currently available.