Quickstart - H Tech Hub

Holo3 is a family of efficient, high-performance Vision-Language Models (VLMs) designed for autonomous agents, UI automation, and multimodal AI applications.

Model	Active Parameters	Main Use Cases	License
Holo3-35B-A3B	3B	High-throughput, low-latency	Apache 2.0
Holo3-122B-A10B	10B	Maximum performance, complex tasks	Research only

Two ways to use Holo3

Mode	Pattern	Output	When to use
Agent loop	Multi-turn: conversation + screenshots → next tool call	`{note, thought, tool_call}` against your tool union	Holo3 as the brain of an autonomous browser or desktop agent
Element localization	Single-turn: image + target description → coordinates	`{x, y}` in `[0, 1000]`	UI grounding inside any external agent or pipeline (yours or someone else’s)

Get started

Generate an API key on Portal-H, export it, and point the OpenAI client at H Company’s endpoint:

export HAI_API_KEY="your-api-key-here"

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.hcompany.ai/v1/",
    api_key=os.environ.get("HAI_API_KEY")
)

MODEL_NAME = "holo3-35b-a3b"  # or "holo3-122b-a10b"

The same API and code paths work for both models; pick based on the latency/quality trade-off in the table above.

Build an autonomous agent

Holo3 can act as the brain of a browser, desktop, or mobile agent: it reads the screen, plans a move, calls a tool, observes the result, and iterates. The Agent loop guide covers the output JSON shape, chat layout, image budget, coordinate convention, and a complete loop you can drop into your harness.

Element localization

Pass Holo3 a screenshot (URL or base64 data URI) and a text description of an element; get click coordinates back. Single-turn, no history, no thinking; set temperature=0.0 and enable_thinking=False. Useful as a vision tool inside any agent.

from pydantic import BaseModel, Field

SCREENSHOT_URL = "https://your-host/screenshot.png"  # or "data:image/png;base64,..."
SCREENSHOT_WIDTH, SCREENSHOT_HEIGHT = 1280, 720
ELEMENT = "the 'Sign in' button in the top-right corner"

class VisualLocalizerOutput(BaseModel):
    x: int = Field(ge=0, le=1000, description="X coordinate as integer in [0, 1000]")
    y: int = Field(ge=0, le=1000, description="Y coordinate as integer in [0, 1000]")

schema = VisualLocalizerOutput.model_json_schema()

prompt = (
    "Localize an element on the GUI image according to the provided target "
    "and output a click position.\n"
    f" * You must output a valid JSON following the format: {schema}\n"
    f" Your target is:\n{ELEMENT}"
)

response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=[{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": SCREENSHOT_URL}},
            {"type": "text", "text": prompt},
        ],
    }],
    extra_body={
        "structured_outputs": {"json": schema},
        "chat_template_kwargs": {"enable_thinking": False},
    },
    temperature=0.0,
)

point = VisualLocalizerOutput.model_validate_json(response.choices[0].message.content)
abs_x = int(point.x / 1000 * SCREENSHOT_WIDTH)
abs_y = int(point.y / 1000 * SCREENSHOT_HEIGHT)
print(f"Click at ({abs_x}, {abs_y})")

Documentation Index

​Two ways to use Holo3

​Get started

​Build an autonomous agent

​Element localization

Two ways to use Holo3

Get started

Build an autonomous agent

Element localization