> ## Documentation Index
> Fetch the complete documentation index at: https://hub.hcompany.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Document OCR

Pass Holo a document page as an image and get clean Markdown back: headings, lists, tables, and equations, in reading order. There is no dedicated OCR endpoint; it is the same OpenAI-compatible `chat/completions` call with an image plus a transcription prompt. Send it as a single request with `temperature=0.0` and `enable_thinking=False` so the model transcribes in one shot instead of reasoning first.

<Note>
  Holo OCR is strongest on **English, digitally generated documents** (exported PDFs, slides, web pages, reports). Scanned pages and photos are best-effort, and **handwriting is not a good fit**. For high-stakes handwritten or non-Latin content, use a dedicated OCR system.
</Note>

Set up the OpenAI client first by following the [Quickstart](/quickstart).

## Transcribe a page

Send one page image and read the Markdown from `message.content`.

<CodeGroup>
  ```python Python theme={null}
  IMAGE_URL = "https://your-host/page.png"  # or "data:image/png;base64,..."

  OCR_PROMPT = (
      "Transcribe this document page to Markdown, preserving the reading order, "
      "headings, lists, and tables. Render tables as Markdown tables and equations "
      "as LaTeX. Return only the transcription, with no commentary and no surrounding "
      "code fence. If the page has no readable text, return an empty string."
  )

  response = client.chat.completions.create(
      model="holo3-1-35b-a3b",
      messages=[{
          "role": "user",
          "content": [
              {"type": "image_url", "image_url": {"url": IMAGE_URL}},
              {"type": "text", "text": OCR_PROMPT},
          ],
      }],
      temperature=0.0,
      extra_body={"chat_template_kwargs": {"enable_thinking": False}},
  )

  print(response.choices[0].message.content)
  ```

  ```typescript TypeScript theme={null}
  const IMAGE_URL = "https://your-host/page.png"; // or "data:image/png;base64,..."

  const OCR_PROMPT =
    "Transcribe this document page to Markdown, preserving the reading order, " +
    "headings, lists, and tables. Render tables as Markdown tables and equations " +
    "as LaTeX. Return only the transcription, with no commentary and no surrounding " +
    "code fence. If the page has no readable text, return an empty string.";

  const response = await client.chat.completions.create({
    model: "holo3-1-35b-a3b",
    messages: [
      {
        role: "user",
        content: [
          { type: "image_url", image_url: { url: IMAGE_URL } },
          { type: "text", text: OCR_PROMPT },
        ],
      },
    ],
    temperature: 0.0,
    // chat_template_kwargs is H-specific, passed through in the request body
    ...({ chat_template_kwargs: { enable_thinking: false } } as any),
  });

  console.log(response.choices[0].message.content);
  ```
</CodeGroup>

## Multi-page PDFs

Holo reads images, not PDFs, so rasterize each page to an image and transcribe them one per request, then stitch the results. One page per request keeps each image at full resolution and is the most reliable pattern.

<CodeGroup>
  ```python Python theme={null}
  import base64
  import pymupdf  # pip install pymupdf

  def ocr_page(png_bytes: bytes) -> str:
      data_uri = "data:image/png;base64," + base64.b64encode(png_bytes).decode()
      response = client.chat.completions.create(
          model="holo3-1-35b-a3b",
          messages=[{
              "role": "user",
              "content": [
                  {"type": "image_url", "image_url": {"url": data_uri}},
                  {"type": "text", "text": OCR_PROMPT},
              ],
          }],
          temperature=0.0,
          extra_body={"chat_template_kwargs": {"enable_thinking": False}},
      )
      return response.choices[0].message.content or ""

  with pymupdf.open("document.pdf") as doc:
      pages = [ocr_page(page.get_pixmap(dpi=200).tobytes("png")) for page in doc]

  markdown = "\n\n".join(pages)
  print(markdown)
  ```

  ```typescript TypeScript theme={null}
  import { pdf } from "pdf-to-img"; // npm install pdf-to-img

  async function ocrPage(png: Buffer): Promise<string> {
    const dataUri = "data:image/png;base64," + png.toString("base64");
    const response = await client.chat.completions.create({
      model: "holo3-1-35b-a3b",
      messages: [
        {
          role: "user",
          content: [
            { type: "image_url", image_url: { url: dataUri } },
            { type: "text", text: OCR_PROMPT },
          ],
        },
      ],
      temperature: 0.0,
      ...({ chat_template_kwargs: { enable_thinking: false } } as any),
    });
    return response.choices[0].message.content ?? "";
  }

  const pages: string[] = [];
  for await (const page of await pdf("document.pdf", { scale: 2 })) {
    pages.push(await ocrPage(page));
  }

  const markdown = pages.join("\n\n");
  console.log(markdown);
  ```
</CodeGroup>

<Tip>
  Rasterize at roughly 150 to 200 DPI (or `scale: 2`). Lower resolution loses small text; much higher wastes tokens without improving accuracy. Run pages concurrently to speed up long documents, within your [rate limit](/models#rate-limits-and-billing).
</Tip>

<Warning>
  `holo3-1-35b-a3b` caps output at 4,096 tokens per request, and a dense page (large tables, small print) can exceed that: check `finish_reason` and treat `length` as a truncated transcription. Split the page image, or switch to `holo3-122b-a10b` (32,768-token output cap) for dense documents. See [Models](/models).
</Warning>

## Next steps

<CardGroup cols={3}>
  <Card title="Element localization" icon="crosshairs" href="/element-localization">
    Get click coordinates from a screenshot.
  </Card>

  <Card title="Agent loop" icon="arrows-rotate" href="/agent-loop">
    How to use Holo in your computer-use harness.
  </Card>

  <Card title="API reference" icon="code" href="/api-reference">
    Endpoint, models, parameters, and limits.
  </Card>
</CardGroup>
