GET /v1/models.
| Model ID | Architecture | Context | Max output | Input / output per 1M tokens | Native function calling | License |
|---|---|---|---|---|---|---|
holo3-1-35b-a3b | MoE, 35B / 3B active | 65,536 | 4,096 | 1.80 | Yes | Apache 2.0 |
holo3-122b-a10b | MoE, 122B / 10B active | 65,536 | 32,768 | 3.00 | No | Research only |
Holo3.1 35B (holo3-1-35b-a3b)
Fast, low-latency computer use across web, desktop, and mobile. Free tier (rate-limited, 10 RPM). Open weights on Hugging Face.
Holo3 122B (holo3-122b-a10b)
Maximum performance for complex tasks. Paid tier only. API-only: weights are not published; see the blog post for benchmarks.
Choosing a model
- Start with
holo3-1-35b-a3b: it is on the free tier, supports both output formats (structured outputs and nativetool_calls), and its latency suits interactive agent loops. - Switch to
holo3-122b-a10bwhen task complexity dominates: long multi-step navigation, dense reasoning, or when the 35B’s 4,096-token output cap is too tight (for example long document transcriptions). It supports structured outputs but not native function calling.
Open weights and local inference
holo3-1-35b-a3b corresponds to the open-weight Holo3.1-35B-A3B release. The Holo3.1 collection on Hugging Face also carries the other family sizes (0.8B, 4B, 9B) and quantized FP8, GGUF, and NVFP4 builds; those are for self-hosting and are not served by this API. See run a local model server for a vLLM setup.
Model lifecycle
Model IDs are stable identifiers. When a model is scheduled for removal, itsdeprecation_date field is set in GET /v1/models and a notice appears here; after removal, requests to the old ID fail with a model_not_found error. Pin a model ID in production and check deprecation_date when you upgrade.
Rate limits and billing
Free tier
Free tier
Rate-limited access to
holo3-1-35b-a3b (10 requests per minute) without a credit card. Create a key on Portal-H.Paid tier
Paid tier
Higher rate limits plus access to
holo3-122b-a10b. Add credits on Portal-H. Billing is per model, per million input and output tokens; the API uses zero data retention by default. Current rates and FAQ: Models API pricing.