Glossary - H Tech Hub

Key terms used across the Models API docs, grouped by theme.

Models and families

Term	Definition
Holo3.1	Latest generation Vision-Language Model (VLM) family for GUI agents that interact with real digital environments (web, desktop, mobile).
Holo3.1 family	Model sizes from 0.8B to 35B-A3B, spanning on-device to server deployments.
Holo3.1-35B-A3B	Open-source (Apache 2.0) model variant, available in BF16, FP8, NVFP4, and Q4 GGUF for cloud and local inference.
Holo3	Prior generation that Holo3.1 builds on.
Holo2	Earlier generation model that Holo3 improved upon.
Qwen/Qwen3.5-35B-A3B	Base model used for fine-tuning Holo3.1-35B-A3B.
Surfer-H	Example computer-use agent built on the Holo model family.

Capabilities and tasks

Term	Definition
Vision-Language Model (VLM)	A model that understands both visual inputs (like UI screens) and text, so it can interpret interfaces and perform actions.
GUI Agents	AI agents that operate graphical user interfaces by observing screens, reasoning about them, and executing actions.
Computer Use (CU)	The ability of an AI system to perform tasks on a computer, such as navigating interfaces and executing commands.
Navigation (in AI agents)	The process of completing tasks through multi-step reasoning and actions across interfaces.
Element Localization	Single-turn vision task: given a screenshot and a text description of a target UI element, return click coordinates. A grounding primitive that can be used inside larger agent harnesses.
Action Grounding	Connecting model decisions to actual executable actions in an environment.
Cross-environment Generalization	Ability to perform well across different platforms (web, desktop, mobile), including unseen environments.

Benchmarks

Term	Definition
OSWorld	Benchmark evaluating performance in real Ubuntu desktop environments.
WebVoyager / WebArena	Benchmarks for testing web navigation and task completion abilities.
AndroidWorld	Benchmark for evaluating performance on mobile environments.

Training and methods

Term	Definition
Policy Learning	Training method where the model learns which actions to take in different situations.
Supervised Fine-Tuning (SFT)	Training stage where the model learns from labeled examples.
Reinforcement Learning (GRPO)	Training method where the model improves through feedback based on its actions.
Synthetic Data	Artificially generated data used to supplement training.
Human-Annotated Data	Data labeled by humans to improve model accuracy.
State-of-the-Art (SOTA)	Performance that is among the best currently available.