Skip to main content
The Models API serves two Holo models. This page is the single source of truth for what is available; you can also query it programmatically with GET /v1/models.
Model IDArchitectureContextMax outputInput / output per 1M tokensNative function callingLicense
holo3-1-35b-a3bMoE, 35B / 3B active65,5364,0960.25/0.25 / 1.80YesApache 2.0
holo3-122b-a10bMoE, 122B / 10B active65,53632,7680.40/0.40 / 3.00NoResearch only
Both models accept text + images (JPEG, PNG, WebP; up to 5 images per request) and support the reasoning channel and structured outputs.

Holo3.1 35B (holo3-1-35b-a3b)

Fast, low-latency computer use across web, desktop, and mobile. Free tier (rate-limited, 10 RPM). Open weights on Hugging Face.

Holo3 122B (holo3-122b-a10b)

Maximum performance for complex tasks. Paid tier only. API-only: weights are not published; see the blog post for benchmarks.

Choosing a model

  • Start with holo3-1-35b-a3b: it is on the free tier, supports both output formats (structured outputs and native tool_calls), and its latency suits interactive agent loops.
  • Switch to holo3-122b-a10b when task complexity dominates: long multi-step navigation, dense reasoning, or when the 35B’s 4,096-token output cap is too tight (for example long document transcriptions). It supports structured outputs but not native function calling.

Open weights and local inference

holo3-1-35b-a3b corresponds to the open-weight Holo3.1-35B-A3B release. The Holo3.1 collection on Hugging Face also carries the other family sizes (0.8B, 4B, 9B) and quantized FP8, GGUF, and NVFP4 builds; those are for self-hosting and are not served by this API. See run a local model server for a vLLM setup.

Model lifecycle

Model IDs are stable identifiers. When a model is scheduled for removal, its deprecation_date field is set in GET /v1/models and a notice appears here; after removal, requests to the old ID fail with a model_not_found error. Pin a model ID in production and check deprecation_date when you upgrade.

Rate limits and billing

Rate-limited access to holo3-1-35b-a3b (10 requests per minute) without a credit card. Create a key on Portal-H.