Models - H Tech Hub

The Models API serves two Holo models. This page is the single source of truth for what is available; you can also query it programmatically with GET /v1/models.

Model ID	Architecture	Context	Max output	Input / output per 1M tokens	Native function calling	License
`holo3-1-35b-a3b`	MoE, 35B / 3B active	65,536	4,096	$0.25 /$ 1.80	Yes	Apache 2.0
`holo3-122b-a10b`	MoE, 122B / 10B active	65,536	32,768	$0.40 /$ 3.00	No	Research only

Both models accept text + images (JPEG, PNG, WebP; up to 5 images per request) and support the reasoning channel and structured outputs.

Holo3.1 35B (holo3-1-35b-a3b)

Fast, low-latency computer use across web, desktop, and mobile. Free tier (rate-limited, 10 RPM). Open weights on Hugging Face.

Holo3 122B (holo3-122b-a10b)

Maximum performance for complex tasks. Paid tier only. API-only: weights are not published; see the blog post for benchmarks.

Choosing a model

Start with holo3-1-35b-a3b: it is on the free tier, supports both output formats (structured outputs and native tool_calls), and its latency suits interactive agent loops.
Switch to holo3-122b-a10b when task complexity dominates: long multi-step navigation, dense reasoning, or when the 35B’s 4,096-token output cap is too tight (for example long document transcriptions). It supports structured outputs but not native function calling.

Open weights and local inference

holo3-1-35b-a3b corresponds to the open-weight Holo3.1-35B-A3B release. The Holo3.1 collection on Hugging Face also carries the other family sizes (0.8B, 4B, 9B) and quantized FP8, GGUF, and NVFP4 builds; those are for self-hosting and are not served by this API. See run a local model server for a vLLM setup.

Model lifecycle

Model IDs are stable identifiers. When a model is scheduled for removal, its deprecation_date field is set in GET /v1/models and a notice appears here; after removal, requests to the old ID fail with a model_not_found error. Pin a model ID in production and check deprecation_date when you upgrade.

Rate limits and billing

Free tier

Rate-limited access to holo3-1-35b-a3b (10 requests per minute) without a credit card. Create a key on Portal-H.

Paid tier

Higher rate limits plus access to holo3-122b-a10b. Add credits on Portal-H. Billing is per model, per million input and output tokens; the API uses zero data retention by default. Current rates and FAQ: Models API pricing.

Holo3.1 35B (holo3-1-35b-a3b)

Holo3 122B (holo3-122b-a10b)

​Choosing a model

​Open weights and local inference

​Model lifecycle

​Rate limits and billing

Choosing a model

Open weights and local inference

Model lifecycle

Rate limits and billing