> ## Documentation Index > Fetch the complete documentation index at: https://hub.hcompany.ai/llms.txt > Use this file to discover all available pages before exploring further. # Create chat completion > OpenAI-compatible chat completion with Holo-specific structured outputs and reasoning. The single inference endpoint. It is OpenAI-compatible: the official OpenAI clients work as-is with `base_url` pointed at `https://api.hcompany.ai/v1/`. Holo-specific behavior (structured outputs, the reasoning toggle) is controlled by extra body fields documented below. **Returns** a chat completion object, or a stream of chunk objects when `stream` is `true`. *** ## Body parameters Model ID to run. One of the IDs listed on the [Models](/models) page, e.g. `holo3-1-35b-a3b`. The conversation so far. Standard OpenAI message objects (`role`, `content`); `content` can be a string or an array of `text` and `image_url` parts. Images accept HTTPS URLs or base64 data URIs (JPEG, PNG, WebP), up to 5 per request. Holo-specific. Constrain the response, at the decoding level, to a JSON object matching a schema: pass `{"json": }`. The object is returned in `message.content`. Use this for the [structured-output agent loop](/agent-loop) and [element localization](/element-localization). With the OpenAI SDKs, pass this (and `chat_template_kwargs`) via `extra_body` in Python or an untyped spread in TypeScript; the SDK merges them into the request body. On the raw wire they are top-level fields, as in the cURL example. The API silently ignores a body nested under a literal `"extra_body"` key. Holo-specific. `{"enable_thinking": bool}` toggles the reasoning channel. Use `true` for agent loops (Holo plans before acting), `false` for single-shot calls like grounding and OCR. How much the model plans before acting: `"low"`, `"medium"`, or `"high"`. `"medium"` is a sensible default for agent loops. OpenAI-style function declarations for [native function calling](/agent-loop#output-format-and-tool-calls). Supported by `holo3-1-35b-a3b` only. Set `tool_choice: "required"` so the model acts on every step, and do not mix with `structured_outputs`. Standard OpenAI semantics. Use `"required"` in function-calling agent loops. Stream the response as server-sent chunk events. Reasoning tokens arrive in `delta.reasoning`, content in `delta.content`. Output cap for this request. The hard per-model ceilings differ: 4,096 for `holo3-1-35b-a3b`, 32,768 for `holo3-122b-a10b` (see [Models](/models)). Sampling temperature. Use `0.0` for deterministic single-shot calls ([localization](/element-localization), [OCR](/document-ocr)); `0.8` works well in agent loops. Also supported: `top_p`, `top_k`, `stop`, `frequency_penalty`, `presence_penalty`, `seed`. *** ## Response The action or answer: the constrained JSON object (structured-output mode) or the assistant text. `null` when the model responded with `tool_calls` only. The thinking trace, present when thinking is enabled. Read it for visibility; do not feed it back into the conversation. The chat template drops it between turns, so anything the model must remember has to flow through `content`. See the [Agent loop](/agent-loop#reasoning) for carrying state forward. Present in native function-calling mode only. Each call carries an `id` and a `function` object with `name` and a JSON-encoded `arguments` string. `stop`, `length` (hit `max_tokens` or the model ceiling), or `tool_calls`. `prompt_tokens`, `completion_tokens`, `total_tokens` for the request. *** ## Examples ```python Python theme={null} import os from openai import OpenAI client = OpenAI( base_url="https://api.hcompany.ai/v1/", api_key=os.environ["HAI_API_KEY"], ) resp = client.chat.completions.create( model="holo3-1-35b-a3b", messages=[{"role": "user", "content": "In one sentence, what is a computer-use agent?"}], reasoning_effort="medium", extra_body={"chat_template_kwargs": {"enable_thinking": True}}, ) print(resp.choices[0].message.content) ``` ```typescript TypeScript theme={null} import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.hcompany.ai/v1/", apiKey: process.env.HAI_API_KEY, }); const resp = await client.chat.completions.create({ model: "holo3-1-35b-a3b", messages: [{ role: "user", content: "In one sentence, what is a computer-use agent?" }], reasoning_effort: "medium", ...({ chat_template_kwargs: { enable_thinking: true } } as any), }); console.log(resp.choices[0].message.content); ``` ```bash cURL theme={null} curl https://api.hcompany.ai/v1/chat/completions \ -H "Authorization: Bearer $HAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "holo3-1-35b-a3b", "messages": [{"role": "user", "content": "In one sentence, what is a computer-use agent?"}], "reasoning_effort": "medium", "chat_template_kwargs": {"enable_thinking": true} }' ``` ### Streaming ```python Python theme={null} stream = client.chat.completions.create( model="holo3-1-35b-a3b", messages=[{"role": "user", "content": "In one sentence, what is a computer-use agent?"}], stream=True, ) for chunk in stream: delta = chunk.choices[0].delta if delta.content: print(delta.content, end="", flush=True) ``` ```typescript TypeScript theme={null} const stream = await client.chat.completions.create({ model: "holo3-1-35b-a3b", messages: [{ role: "user", content: "In one sentence, what is a computer-use agent?" }], stream: true, }); for await (const chunk of stream) { const delta = chunk.choices[0]?.delta; if (delta?.content) process.stdout.write(delta.content); } ```