> ## Documentation Index > Fetch the complete documentation index at: https://hub.hcompany.ai/llms.txt > Use this file to discover all available pages before exploring further. # Models > The Holo models served by the Models API: capabilities, pricing, and lifecycle. The Models API serves two Holo models. This page is the single source of truth for what is available; you can also query it programmatically with [`GET /v1/models`](/models-api/list-models). | Model ID | Architecture | Context | Max output | Input / output per 1M tokens | Native function calling | License | | :---------------- | :--------------------- | :------ | :--------- | :--------------------------- | :---------------------- | :------------ | | `holo3-1-35b-a3b` | MoE, 35B / 3B active | 65,536 | 4,096 | $0.25 / $1.80 | Yes | Apache 2.0 | | `holo3-122b-a10b` | MoE, 122B / 10B active | 65,536 | 32,768 | $0.40 / $3.00 | No | Research only | Both models accept text + images (JPEG, PNG, WebP; up to 5 images per request) and support the reasoning channel and [structured outputs](/agent-loop#output-format-and-tool-calls). Fast, low-latency computer use across web, desktop, and mobile. Free tier (rate-limited, 10 RPM). Open weights on Hugging Face. Maximum performance for complex tasks. Paid tier only. API-only: weights are not published; see the blog post for benchmarks. ## Choosing a model * Start with `holo3-1-35b-a3b`: it is on the free tier, supports both output formats (structured outputs and native `tool_calls`), and its latency suits interactive agent loops. * Switch to `holo3-122b-a10b` when task complexity dominates: long multi-step navigation, dense reasoning, or when the 35B's 4,096-token output cap is too tight (for example long [document transcriptions](/document-ocr)). It supports structured outputs but not native function calling. ## Open weights and local inference `holo3-1-35b-a3b` corresponds to the open-weight Holo3.1-35B-A3B release. The [Holo3.1 collection on Hugging Face](https://huggingface.co/collections/Hcompany/holo31) also carries the other family sizes (0.8B, 4B, 9B) and quantized FP8, GGUF, and NVFP4 builds; those are for self-hosting and are not served by this API. See [run a local model server](/holo-desktop-cli/how-to/run-a-local-model-server) for a vLLM setup. ## Model lifecycle Model IDs are stable identifiers. When a model is scheduled for removal, its [`deprecation_date`](/models-api/list-models) field is set in `GET /v1/models` and a notice appears here; after removal, requests to the old ID fail with a `model_not_found` error. Pin a model ID in production and check `deprecation_date` when you upgrade. ## Rate limits and billing Rate-limited access to `holo3-1-35b-a3b` (10 requests per minute) without a credit card. Create a key on [Portal-H](https://portal.hcompany.ai/). Higher rate limits plus access to `holo3-122b-a10b`. Add credits on [Portal-H](https://portal.hcompany.ai/credits). Billing is per model, per million input and output tokens; the API uses zero data retention by default. Current rates and FAQ: [Models API pricing](https://hcompany.ai/holo-models-api).