Skip to main content
Think of HoloDesktop CLI as a thin client around a local computer-use agent runtime. There are two main pieces:
  • the open-source Python client, distributed as holo-desktop-cli;
  • the computer-use agent runtime executable, hai-agent-runtime.
The client is the part you run from your terminal, MCP host, ACP host, or Python code. The runtime observes the screen, plans actions, clicks and types, and streams events back.
HoloDesktop CLI architecture: callers invoke the Python client, which starts the local runtime, controls visible desktop apps, writes local artifacts, and calls either hosted or local inference.
The same local runtime sits behind every surface. The main choice is where inference happens: H Company’s hosted Models API or an OpenAI-compatible endpoint you provide.

The same runtime behind every surface

The CLI, MCP server, ACP server, skill, and Python client all route work to the same local runtime. The surface changes how the agent is invoked. It does not create a separate kind of computer-use agent.
SurfaceUse it for
CLIRun one foreground desktop task from a terminal or script.
MCPLet Claude Code, Cursor, Codex, or another MCP host call HoloDesktop CLI as a tool.
ACPLet an ACP-compatible host delegate desktop work to HoloDesktop CLI as a sub-agent.
SkillGive a host a reusable HoloDesktop CLI instruction surface.
PythonStart sessions and stream events from application code.
For commands and flags, use CLI reference. For host setup, use the Integrations pages.

Runtime lifecycle

The Python client starts or attaches to hai-agent-runtime on loopback. If a healthy runtime is already listening on the target port, the client reuses it. Otherwise, it starts one. The runtime is local to your machine. It owns desktop observation and action. The client owns installation, launch, host integration, and user-facing commands. For exact cache paths, logs, token files, and run directories, use Paths and files.

Inference path

The runtime needs a model backend. HoloDesktop CLI has two modes:
  • hosted mode, where the runtime sends task-relevant model inputs to H Company’s Models API;
  • local mode, where the runtime sends model inputs to the OpenAI-compatible endpoint you provide.
The rest of the architecture is the same in both modes: the runtime still runs locally, controls the desktop locally, and writes local diagnostics. The difference is where model inference happens. For setup, use Hosted or local models. For privacy implications, use Security and privacy.

Desktop control

The agent operates desktop state, not source code or APIs. It observes what is visible, plans the next action, and uses desktop tools to click, type, scroll, and switch apps. Foreground state matters. A CLI task can move focus and use the active desktop while it runs. Host integrations may make it feel like a tool call, but the underlying action is still desktop operation.

User context

The client snapshots user context at run start and sends it to the runtime with the task. That context can include standing instructions, memories, rules, and installed skills from ~/.holo/. This makes the agent customizable, but those files can affect behavior. Keep persistent instructions specific and review them if a run behaves unexpectedly. For the exact files, use Paths and files.

Run artifacts

The runtime emits events as it works. Clients use those events to print progress, stream updates, and debug failures. The same stream is also persisted locally as run artifacts. Those artifacts are diagnostics, not a product analytics upload. In standalone HoloDesktop CLI mode, run traces stay on the user’s machine unless the user chooses to share them. For event structure, use Debug a failed run. For storage and privacy, use Paths and files and Security and privacy.

Verification pattern

The agent can observe and act, but strong workflows should still verify outputs separately when possible. Strong examples follow this pattern:
  1. Stage known inputs.
  2. Ask the agent to perform visible desktop work.
  3. Persist run artifacts.
  4. Verify the resulting files, UI state, or external state deterministically.
This keeps the roles clear: the agent performs the visible desktop work; the verifier decides whether the work met the contract. The expense-report example shows that pattern end to end.