Skip to main content

Documentation Index

Fetch the complete documentation index at: https://hub.hcompany.ai/llms.txt

Use this file to discover all available pages before exploring further.

Holo3 is a family of efficient, high-performance Vision-Language Models (VLMs) designed for autonomous agents, UI automation, and multimodal AI applications.
ModelActive ParametersMain Use CasesLicense
Holo3-35B-A3B3BHigh-throughput, low-latencyApache 2.0
Holo3-122B-A10B10BMaximum performance, complex tasksResearch only

Two ways to use Holo3

ModePatternOutputWhen to use
Agent loopMulti-turn: conversation + screenshots → next tool call{note, thought, tool_call} against your tool unionHolo3 as the brain of an autonomous browser or desktop agent
Element localizationSingle-turn: image + target description → coordinates{x, y} in [0, 1000]UI grounding inside any external agent or pipeline (yours or someone else’s)

Get started

Generate an API key on Portal-H, export it, and point the OpenAI client at H Company’s endpoint:
export HAI_API_KEY="your-api-key-here"
import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.hcompany.ai/v1/",
    api_key=os.environ.get("HAI_API_KEY")
)

MODEL_NAME = "holo3-35b-a3b"  # or "holo3-122b-a10b"
The same API and code paths work for both models; pick based on the latency/quality trade-off in the table above.

Build an autonomous agent

Holo3 can act as the brain of a browser, desktop, or mobile agent: it reads the screen, plans a move, calls a tool, observes the result, and iterates. The Agent loop guide covers the output JSON shape, chat layout, image budget, coordinate convention, and a complete loop you can drop into your harness.

Element localization

Pass Holo3 a screenshot (URL or base64 data URI) and a text description of an element; get click coordinates back. Single-turn, no history, no thinking; set temperature=0.0 and enable_thinking=False. Useful as a vision tool inside any agent.
from pydantic import BaseModel, Field

SCREENSHOT_URL = "https://your-host/screenshot.png"  # or "data:image/png;base64,..."
SCREENSHOT_WIDTH, SCREENSHOT_HEIGHT = 1280, 720
ELEMENT = "the 'Sign in' button in the top-right corner"

class VisualLocalizerOutput(BaseModel):
    x: int = Field(ge=0, le=1000, description="X coordinate as integer in [0, 1000]")
    y: int = Field(ge=0, le=1000, description="Y coordinate as integer in [0, 1000]")

schema = VisualLocalizerOutput.model_json_schema()

prompt = (
    "Localize an element on the GUI image according to the provided target "
    "and output a click position.\n"
    f" * You must output a valid JSON following the format: {schema}\n"
    f" Your target is:\n{ELEMENT}"
)

response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=[{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": SCREENSHOT_URL}},
            {"type": "text", "text": prompt},
        ],
    }],
    extra_body={
        "structured_outputs": {"json": schema},
        "chat_template_kwargs": {"enable_thinking": False},
    },
    temperature=0.0,
)

point = VisualLocalizerOutput.model_validate_json(response.choices[0].message.content)
abs_x = int(point.x / 1000 * SCREENSHOT_WIDTH)
abs_y = int(point.y / 1000 * SCREENSHOT_HEIGHT)
print(f"Click at ({abs_x}, {abs_y})")