Skip to main content
The Browser (kind: "web") is the environment H ships today: a managed, remote web browser the platform provisions per session. Reference a built-in browser by catalog name (like "h/browser"), or define one inline in an agent’s environments list.

Modes

At each step the agent receives a fresh observation of the page, carried on the event stream so you can watch what it saw. The mode field sets what the observation contains and how the agent acts on the page:
  • visual (default): a screenshot with the current URL and open tabs (a web observation). The agent acts by viewport coordinates.
  • multimodal: the screenshot plus the visible page rendered as markdown text (also a web observation), for tasks that benefit from seeing and reading at once.
  • text: markdown only, no screenshot (a textual_web observation). The page is split into chunks the agent reads and pages through, navigating by URL. Read-only and coordinate-free, which makes it fast and cheap for search and research agents.
Reach for text when the task is reading and navigating (search, research, scraping); use visual or multimodal when the agent must click, type, or operate controls that only exist on the rendered page. Each mode also fixes the set of actions available to the agent. It chooses them autonomously as it works; you never call them directly and there is no per-agent tool list to configure. To shape how it uses them, set the agent’s instructions.
ActionDescriptionVisualMultimodalText
go_to_webNavigate to a URL.
go_back_webGo back in the browser history.
refresh_webRefresh the current page.
switch_tab_webSwitch to another tab, or open a new one.
close_tab_webClose a tab.
click_webClick at viewport coordinates.
writeFocus an input at coordinates and type into it.
select_optionPick an option from a native <select> dropdown.
move_mouse_webMove the mouse to reveal hovers, tooltips, or menus.
press_keys_webPress keys or keyboard shortcuts.
scroll_webScroll the page or a nested scrollable container.
ctrl_f_webJump to the next on-page match of a text query.
reader_modeExtract the page’s main content as clean markdown.
find_in_pageFind a text query and jump to the chunk that contains it.
switch_chunkPage forward or backward through the page’s text chunks.
wait_webPause for the page to settle (up to 60 seconds).

Configuration

When you define a Browser inline, every field except kind is required; reference a catalog entry instead and it supplies them for you.
FieldTypeRequiredDescription
idstringYesCatalog identifier for the environment.
kindstringNo (defaults to "web")Environment type. Currently only web.
headlessbooleanYesRun without a visible window.
widthintegerYesViewport width in pixels. Must be a positive integer.
heightintegerYesViewport height in pixels. Must be a positive integer.
start_urlstring | nullYes (nullable)Initial URL to open. Pass null to start on a blank page.
modestringNo (default "visual")How the agent perceives and drives the browser: visual (screenshots + coordinate actions), multimodal (screenshots plus page markdown), or text (read-only markdown navigation, no screenshots).