Skip to main content
Key terms used across the Models API docs, grouped by theme.

Models and families

TermDefinition
Holo3.1Latest generation Vision-Language Model (VLM) family for GUI agents that interact with real digital environments (web, desktop, mobile).
Holo3.1 familyModel sizes from 0.8B to 35B-A3B, spanning on-device to state-of-the-art deployments.
Holo3.1-35B-A3BOpen-source (Apache 2.0) model variant, available in BF16, FP8, NVFP4, and Q4 GGUF for cloud and local inference.
Holo3Prior generation that Holo3.1 builds on.
Holo2Earlier generation model that Holo3 improved upon.
Qwen/Qwen3.5-35B-A3BBase model used for fine-tuning Holo3.1-35B-A3B.
Surfer-HExample next-generation computer-use agent built on the Holo model family.

Capabilities and tasks

TermDefinition
Vision-Language Model (VLM)A model that understands both visual inputs (like UI screens) and text, enabling it to interpret interfaces and perform actions.
GUI AgentsAI agents that operate graphical user interfaces by observing screens, reasoning about them, and executing actions.
Computer Use (CU)The ability of an AI system to perform tasks on a computer, such as navigating interfaces and executing commands.
Navigation (in AI agents)The process of completing tasks through multi-step reasoning and actions across interfaces.
Element LocalizationSingle-turn vision task: given a screenshot and a text description of a target UI element, return click coordinates. A grounding primitive that can be used inside larger agent harnesses.
Action GroundingConnecting model decisions to actual executable actions in an environment.
Cross-environment GeneralizationAbility to perform well across different platforms (web, desktop, mobile), including unseen environments.

Benchmarks

TermDefinition
OSWorldBenchmark evaluating performance in real Ubuntu desktop environments.
WebVoyager / WebArenaBenchmarks for testing web navigation and task completion abilities.
AndroidWorldBenchmark for evaluating performance on mobile environments.

Training and methods

TermDefinition
Policy LearningTraining method where the model learns which actions to take in different situations.
Supervised Fine-Tuning (SFT)Training stage where the model learns from labeled examples.
Reinforcement Learning (GRPO)Training method where the model improves through feedback based on its actions.
Synthetic DataArtificially generated data used to supplement training.
Human-Annotated DataData labeled by humans to improve model accuracy.
State-of-the-Art (SOTA)Performance that is among the best currently available.