> ## Documentation Index
> Fetch the complete documentation index at: https://hub.hcompany.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Create chat completion

> OpenAI-compatible chat completion with Holo-specific structured outputs and reasoning.

The single inference endpoint. It is OpenAI-compatible: the official OpenAI clients work as-is with `base_url` pointed at `https://api.hcompany.ai/v1/`. Holo-specific behavior (structured outputs, the reasoning toggle) is controlled by extra body fields documented below.

**Returns** a chat completion object, or a stream of chunk objects when `stream` is `true`.

***

## Body parameters

<ParamField body="model" type="string" required>
  Model ID to run. One of the IDs listed on the [Models](/models) page, e.g. `holo3-1-35b-a3b`.
</ParamField>

<ParamField body="messages" type="array" required>
  The conversation so far. Standard OpenAI message objects (`role`, `content`); `content` can be a string or an array of `text` and `image_url` parts. Images accept HTTPS URLs or base64 data URIs (JPEG, PNG, WebP), up to 5 per request.
</ParamField>

<ParamField body="structured_outputs" type="object">
  Holo-specific. Constrain the response, at the decoding level, to a JSON object matching a schema: pass `{"json": <JSON Schema>}`. The object is returned in `message.content`. Use this for the [structured-output agent loop](/agent-loop) and [element localization](/element-localization).

  <Note>
    With the OpenAI SDKs, pass this (and `chat_template_kwargs`) via `extra_body` in Python or an untyped spread in TypeScript; the SDK merges them into the request body. On the raw wire they are top-level fields, as in the cURL example. The API silently ignores a body nested under a literal `"extra_body"` key.
  </Note>
</ParamField>

<ParamField body="chat_template_kwargs" type="object">
  Holo-specific. `{"enable_thinking": bool}` toggles the reasoning channel. Use `true` for agent loops (Holo plans before acting), `false` for single-shot calls like grounding and OCR.
</ParamField>

<ParamField body="reasoning_effort" type="string">
  How much the model plans before acting: `"low"`, `"medium"`, or `"high"`. `"medium"` is a sensible default for agent loops.
</ParamField>

<ParamField body="tools" type="array">
  OpenAI-style function declarations for [native function calling](/agent-loop#output-format-and-tool-calls). Supported by `holo3-1-35b-a3b` only. Set `tool_choice: "required"` so the model acts on every step, and do not mix with `structured_outputs`.
</ParamField>

<ParamField body="tool_choice" type="string">
  Standard OpenAI semantics. Use `"required"` in function-calling agent loops.
</ParamField>

<ParamField body="stream" type="boolean" default="false">
  Stream the response as server-sent chunk events. Reasoning tokens arrive in `delta.reasoning`, content in `delta.content`.
</ParamField>

<ParamField body="max_tokens" type="integer">
  Output cap for this request. The hard per-model ceilings differ: 4,096 for `holo3-1-35b-a3b`, 32,768 for `holo3-122b-a10b` (see [Models](/models)).
</ParamField>

<ParamField body="temperature" type="number">
  Sampling temperature. Use `0.0` for deterministic single-shot calls ([localization](/element-localization), [OCR](/document-ocr)); `0.8` works well in agent loops. Also supported: `top_p`, `top_k`, `stop`, `frequency_penalty`, `presence_penalty`, `seed`.
</ParamField>

***

## Response

<ResponseField name="choices[].message.content" type="string">
  The action or answer: the constrained JSON object (structured-output mode) or the assistant text. `null` when the model responded with `tool_calls` only.
</ResponseField>

<ResponseField name="choices[].message.reasoning" type="string">
  The thinking trace, present when thinking is enabled. Read it for visibility; do not feed it back into the conversation. The chat template drops it between turns, so anything the model must remember has to flow through `content`. See the [Agent loop](/agent-loop#reasoning) for carrying state forward.
</ResponseField>

<ResponseField name="choices[].message.tool_calls" type="array">
  Present in native function-calling mode only. Each call carries an `id` and a `function` object with `name` and a JSON-encoded `arguments` string.
</ResponseField>

<ResponseField name="choices[].finish_reason" type="string">
  `stop`, `length` (hit `max_tokens` or the model ceiling), or `tool_calls`.
</ResponseField>

<ResponseField name="usage" type="object">
  `prompt_tokens`, `completion_tokens`, `total_tokens` for the request.
</ResponseField>

***

## Examples

<CodeGroup>
  ```python Python theme={null}
  import os
  from openai import OpenAI

  client = OpenAI(
      base_url="https://api.hcompany.ai/v1/",
      api_key=os.environ["HAI_API_KEY"],
  )

  resp = client.chat.completions.create(
      model="holo3-1-35b-a3b",
      messages=[{"role": "user", "content": "In one sentence, what is a computer-use agent?"}],
      reasoning_effort="medium",
      extra_body={"chat_template_kwargs": {"enable_thinking": True}},
  )

  print(resp.choices[0].message.content)
  ```

  ```typescript TypeScript theme={null}
  import OpenAI from "openai";

  const client = new OpenAI({
    baseURL: "https://api.hcompany.ai/v1/",
    apiKey: process.env.HAI_API_KEY,
  });

  const resp = await client.chat.completions.create({
    model: "holo3-1-35b-a3b",
    messages: [{ role: "user", content: "In one sentence, what is a computer-use agent?" }],
    reasoning_effort: "medium",
    ...({ chat_template_kwargs: { enable_thinking: true } } as any),
  });

  console.log(resp.choices[0].message.content);
  ```

  ```bash cURL theme={null}
  curl https://api.hcompany.ai/v1/chat/completions \
    -H "Authorization: Bearer $HAI_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "holo3-1-35b-a3b",
      "messages": [{"role": "user", "content": "In one sentence, what is a computer-use agent?"}],
      "reasoning_effort": "medium",
      "chat_template_kwargs": {"enable_thinking": true}
    }'
  ```
</CodeGroup>

### Streaming

<CodeGroup>
  ```python Python theme={null}
  stream = client.chat.completions.create(
      model="holo3-1-35b-a3b",
      messages=[{"role": "user", "content": "In one sentence, what is a computer-use agent?"}],
      stream=True,
  )

  for chunk in stream:
      delta = chunk.choices[0].delta
      if delta.content:
          print(delta.content, end="", flush=True)
  ```

  ```typescript TypeScript theme={null}
  const stream = await client.chat.completions.create({
    model: "holo3-1-35b-a3b",
    messages: [{ role: "user", content: "In one sentence, what is a computer-use agent?" }],
    stream: true,
  });

  for await (const chunk of stream) {
    const delta = chunk.choices[0]?.delta;
    if (delta?.content) process.stdout.write(delta.content);
  }
  ```
</CodeGroup>
