- Multimodal input: send text and images together in one request.
- Structured outputs: receive responses you can act on directly for UI automation and navigation.
Two ways to use Holo
Agent loop
Multi-turn control for an autonomous agent.
Element localization
Get click coordinates from a screenshot.
Get started
Quickstart
Run Holo in five minutes.
API reference
Endpoint, models, pricing, and parameters.
Model cards and benchmarks
For model specs, weights, and performance, see the model cards on Hugging Face and the launch blog posts.Hugging Face
Model cards, weights, and quantized builds.
Holo3.1 release
Mobile, function calling, and local inference.
Holo3 benchmarks
78.85% on OSWorld-Verified.
Prefer to try the models without writing code? HoloTab runs Holo directly in your browser, no setup required.