kind: "web") is the environment H ships today: a managed, remote web browser the platform provisions per session. Reference a built-in browser by catalog name (like "h/browser"), or define one inline in an agent’s environments list.
Modes
At each step the agent receives a fresh observation of the page, carried on the event stream so you can watch what it saw. Themode field sets what the observation contains and how the agent acts on the page:
visual(default): a screenshot with the current URL and open tabs (awebobservation). The agent acts by viewport coordinates.multimodal: the screenshot plus the visible page rendered as markdown text (also awebobservation), for tasks that benefit from seeing and reading at once.text: markdown only, no screenshot (atextual_webobservation). The page is split into chunks the agent reads and pages through, navigating by URL. Read-only and coordinate-free, which makes it fast and cheap for search and research agents.
text when the task is reading and navigating (search, research, scraping); use visual or multimodal when the agent must click, type, or operate controls that only exist on the rendered page.
Each mode also fixes the set of actions available to the agent. It chooses them autonomously as it works; you never call them directly and there is no per-agent tool list to configure. To shape how it uses them, set the agent’s instructions.
| Action | Description | Visual | Multimodal | Text |
|---|---|---|---|---|
go_to_web | Navigate to a URL. | ✓ | ✓ | ✓ |
go_back_web | Go back in the browser history. | ✓ | ✓ | ✓ |
refresh_web | Refresh the current page. | ✓ | ✓ | ✓ |
switch_tab_web | Switch to another tab, or open a new one. | ✓ | ✓ | ✓ |
close_tab_web | Close a tab. | ✓ | ✓ | ✓ |
click_web | Click at viewport coordinates. | ✓ | ✓ | |
write | Focus an input at coordinates and type into it. | ✓ | ✓ | |
select_option | Pick an option from a native <select> dropdown. | ✓ | ✓ | |
move_mouse_web | Move the mouse to reveal hovers, tooltips, or menus. | ✓ | ✓ | |
press_keys_web | Press keys or keyboard shortcuts. | ✓ | ✓ | |
scroll_web | Scroll the page or a nested scrollable container. | ✓ | ✓ | |
ctrl_f_web | Jump to the next on-page match of a text query. | ✓ | ✓ | |
reader_mode | Extract the page’s main content as clean markdown. | ✓ | ✓ | |
find_in_page | Find a text query and jump to the chunk that contains it. | ✓ | ||
switch_chunk | Page forward or backward through the page’s text chunks. | ✓ | ||
wait_web | Pause for the page to settle (up to 60 seconds). | ✓ | ✓ | ✓ |
Configuration
When you define a Browser inline, every field exceptkind is required; reference a catalog entry instead and it supplies them for you.
| Field | Type | Required | Description |
|---|---|---|---|
id | string | Yes | Catalog identifier for the environment. |
kind | string | No (defaults to "web") | Environment type. Currently only web. |
headless | boolean | Yes | Run without a visible window. |
width | integer | Yes | Viewport width in pixels. Must be a positive integer. |
height | integer | Yes | Viewport height in pixels. Must be a positive integer. |
start_url | string | null | Yes (nullable) | Initial URL to open. Pass null to start on a blank page. |
mode | string | No (default "visual") | How the agent perceives and drives the browser: visual (screenshots + coordinate actions), multimodal (screenshots plus page markdown), or text (read-only markdown navigation, no screenshots). |