Skip to main content
Welcome to the Holo3 Quickstart! Holo3 offers a family of efficient, high-performance Vision-Language Models (VLMs) designed for autonomous agents, UI automation, and multimodal AI applications. We currently support two models optimized for different trade-offs between speed, cost, and capability:
ModelActive ParametersMain Use CasesLicense
Holo3-35B-A3B3BHigh-throughput, low-latencyApache 2.0
Holo3-122B-A10B10BMaximum performance, complex tasksResearch only
Both models can process text and images, return structured text outputs, and are fully integrated with our OpenAI-compatible API.
  • Holo3-35B-A3B – Optimized for cost-efficient batch workloads and latency-sensitive automation.
  • Holo3-122B-A10B – Optimized for maximum performance and complex, high-context tasks.

Get started: Holo3-35B-A3B

Holo3-35B-A3B is our efficient Action Vision-Language Model. With only 3B active parameters, it delivers near-flagship performance at dramatically lower latency and cost. Ideal for high-throughput automation, latency-sensitive agents, and cost-efficient batch workloads.

Prerequisites

Holo3-35B-A3B is fully compatible with the standard OpenAI Chat Completions API. First, go to Portal-H to generate an API key. Export it to your local environment:
export HAI_API_KEY="your-api-key-here"

Initialization

import os
from openai import OpenAI

# Initialize the client pointing to H Company's inference platform
client = OpenAI(
    base_url="https://api.hcompany.ai/v1/",
    api_key=os.environ.get("HAI_API_KEY")
)

MODEL_NAME = "holo3-35b-a3b"

Text Query

For standard conversational or reasoning tasks, you can prompt Holo3-35B-A3B just like a standard LLM.
response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=[
        {"role": "user", "content": "What are the advantages of using an efficient 3B active parameter model for UI automation?"}
    ]
)

print(response.choices[0].message.content)

Image Query

Holo3-35B-A3B is natively multimodal. There are two ways to send an image to the Chat Completions API, either by passing a URL or by passing a base64 encoded image.

Image URL

response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe the main subjects in this image."},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://picsum.photos/id/237/500/500"
                    }
                }
            ],
        }
    ]
)

print(response.choices[0].message.content)

Structured Output

For agentic workflows, you often need the model to output exact coordinates or strictly formatted JSON. You can define your schema using Pydantic and pass it to the API.
import json
from pydantic import BaseModel, Field

# Define your expected output schema
class ClickCoordinates(BaseModel):
    x: int = Field(description="The x coordinate on the screen.")
    y: int = Field(description="The y coordinate on the screen.")

response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=[
        {
            "role": "user", 
            "content": [
                {"type": "text", "text": "Find the location of the black puppy and output the exact click coordinates."},
                {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/20/500/500"}}
            ]
        }
    ],
    # Enforce structured JSON output matching your Pydantic schema
    extra_body={
        "structured_outputs": {
            "json": ClickCoordinates.model_json_schema()
        }
    },
    temperature=0.0 # Recommended for precise localization tasks
)

# Parse the guaranteed JSON response
click_data = ClickCoordinates.model_validate_json(response.choices[0].message.content)
print(f"Target located at X: {click_data.x}, Y: {click_data.y}")

Enable/Disable Thinking (Reasoning Mode)

Holo3 is a native reasoning model, and reasoning is always active without requiring extra parameters. You can control this behavior using the thinking parameter inside the extra_body payload.
response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=[
        {"role": "user", "content": "How do I navigate from the homepage to the account billing settings?"}
    ],
)

print("Content:")
print(response.choices[0].message.content)

print("Reasoning content:")
print(response.choices[0].message.reasoning)