Pass Holo a document page as an image and get clean Markdown back: headings, lists, tables, and equations, in reading order. There is no dedicated OCR endpoint; it is the same OpenAI-compatible chat/completions call with an image plus a transcription prompt. Send it as a single request with temperature=0.0 and enable_thinking=False so the model transcribes in one shot instead of reasoning first.
Holo OCR is strongest on English, digitally generated documents (exported PDFs, slides, web pages, reports). Scanned pages and photos are best-effort, and handwriting is not a good fit. For high-stakes handwritten or non-Latin content, use a dedicated OCR system.
Set up the OpenAI client first by following the Quickstart.
Send one page image and read the Markdown from message.content.
IMAGE_URL = "https://your-host/page.png" # or "data:image/png;base64,..."OCR_PROMPT = ( "Transcribe this document page to Markdown, preserving the reading order, " "headings, lists, and tables. Render tables as Markdown tables and equations " "as LaTeX. Return only the transcription, with no commentary and no surrounding " "code fence. If the page has no readable text, return an empty string.")response = client.chat.completions.create( model="holo3-1-35b-a3b", messages=[{ "role": "user", "content": [ {"type": "image_url", "image_url": {"url": IMAGE_URL}}, {"type": "text", "text": OCR_PROMPT}, ], }], temperature=0.0, extra_body={"chat_template_kwargs": {"enable_thinking": False}},)print(response.choices[0].message.content)
Holo reads images, not PDFs, so rasterize each page to an image and transcribe them one per request, then stitch the results. One page per request keeps each image at full resolution and is the most reliable pattern.
Rasterize at roughly 150 to 200 DPI (or scale: 2). Lower resolution loses small text; much higher wastes tokens without improving accuracy. Run pages concurrently to speed up long documents, within your rate limit.
holo3-1-35b-a3b caps output at 4,096 tokens per request, and a dense page (large tables, small print) can exceed that: check finish_reason and treat length as a truncated transcription. Split the page image, or switch to holo3-122b-a10b (32,768-token output cap) for dense documents. See Models.