> ## Documentation Index
> Fetch the complete documentation index at: https://hub.hcompany.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Quickstart

Holo3 is a family of efficient, high-performance Vision-Language Models (VLMs) designed for autonomous agents, UI automation, and multimodal AI applications.

| Model               | Active Parameters | Main Use Cases                     | License       |
| :------------------ | :---------------- | :--------------------------------- | :------------ |
| **Holo3-35B-A3B**   | 3B                | High-throughput, low-latency       | Apache 2.0    |
| **Holo3-122B-A10B** | 10B               | Maximum performance, complex tasks | Research only |

## Two ways to use Holo3

| Mode                          | Pattern                                                 | Output                                               | When to use                                                                  |
| :---------------------------- | :------------------------------------------------------ | :--------------------------------------------------- | :--------------------------------------------------------------------------- |
| [**Agent loop**](/agent-loop) | Multi-turn: conversation + screenshots → next tool call | `{note, thought, tool_call}` against your tool union | Holo3 as the brain of an autonomous browser or desktop agent                 |
| **Element localization**      | Single-turn: image + target description → coordinates   | `{x, y}` in `[0, 1000]`                              | UI grounding inside any external agent or pipeline (yours or someone else's) |

## Get started

Generate an API key on [Portal-H](https://portal.hcompany.ai/), export it, and point the OpenAI client at H Company's endpoint:

```bash theme={null}
export HAI_API_KEY="your-api-key-here"
```

```python theme={null}
import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.hcompany.ai/v1/",
    api_key=os.environ.get("HAI_API_KEY")
)

MODEL_NAME = "holo3-35b-a3b"  # or "holo3-122b-a10b"
```

The same API and code paths work for both models; pick based on the latency/quality trade-off in the table above.

## Build an autonomous agent

Holo3 can act as the brain of a browser, desktop, or mobile agent: it reads the screen, plans a move, calls a tool, observes the result, and iterates. The [Agent loop guide](/agent-loop) covers the output JSON shape, chat layout, image budget, coordinate convention, and a complete loop you can drop into your harness.

## Element localization

Pass Holo3 a screenshot (URL or base64 data URI) and a text description of an element; get click coordinates back. Single-turn, no history, no thinking; set `temperature=0.0` and `enable_thinking=False`. Useful as a vision tool inside any agent.

```python theme={null}
from pydantic import BaseModel, Field

SCREENSHOT_URL = "https://your-host/screenshot.png"  # or "data:image/png;base64,..."
SCREENSHOT_WIDTH, SCREENSHOT_HEIGHT = 1280, 720
ELEMENT = "the 'Sign in' button in the top-right corner"

class VisualLocalizerOutput(BaseModel):
    x: int = Field(ge=0, le=1000, description="X coordinate as integer in [0, 1000]")
    y: int = Field(ge=0, le=1000, description="Y coordinate as integer in [0, 1000]")

schema = VisualLocalizerOutput.model_json_schema()

prompt = (
    "Localize an element on the GUI image according to the provided target "
    "and output a click position.\n"
    f" * You must output a valid JSON following the format: {schema}\n"
    f" Your target is:\n{ELEMENT}"
)

response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=[{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": SCREENSHOT_URL}},
            {"type": "text", "text": prompt},
        ],
    }],
    extra_body={
        "structured_outputs": {"json": schema},
        "chat_template_kwargs": {"enable_thinking": False},
    },
    temperature=0.0,
)

point = VisualLocalizerOutput.model_validate_json(response.choices[0].message.content)
abs_x = int(point.x / 1000 * SCREENSHOT_WIDTH)
abs_y = int(point.y / 1000 * SCREENSHOT_HEIGHT)
print(f"Click at ({abs_x}, {abs_y})")
```
