How to Use OpenClaw's Agent-Browser for Web Automation
OpenClaw's browser control system gives AI agents the ability to interact with websites just like humans do. This tutorial covers the practical aspects of using OpenClaw's agent-browser for web automation, from basic navigation to complex form filling and data extraction.
Why Browser Automation for Agents?
Many tasks that agents need to perform require interacting with web interfaces: monitoring dashboards, submitting forms, extracting data from dynamic pages, or testing web applications. OpenClaw's browser tool provides a powerful interface for these tasks without requiring agents to understand complex web scraping libraries.
Getting Started
OpenClaw's browser control is built into the platform. No additional installation is required. The browser tool uses Playwright under the hood, providing reliable automation across Chromium, Firefox, and WebKit.
Basic Browser Launch
Start a browser session with a simple command:
{
"action": "start",
"profile": "openclaw"
}
This launches an isolated browser instance managed by OpenClaw. For taking over an existing Chrome session (useful for authenticated workflows), use:
{
"action": "start",
"profile": "chrome"
}
The chrome profile connects to your existing Chrome browser via the Browser Relay extension, allowing you to work with already-logged-in sessions.
Navigating to Pages
Once your browser is running, navigate to any URL:
{
"action": "open",
"url": "https://example.com",
"profile": "openclaw"
}
OpenClaw will load the page and wait for the network to be idle before returning control. For pages with dynamic content, you can specify a different load state:
{
"action": "open",
"url": "https://example.com",
"profile": "openclaw",
"loadState": "domcontentloaded"
}
Options for loadState include:
load: Wait for the load event (default)domcontentloaded: Wait for DOMContentLoaded eventnetworkidle: Wait for network to be idle
Taking Snapshots
Snapshots are the foundation of OpenClaw's browser automation. They capture the current state of the page in a structured format that agents can understand:
{
"action": "snapshot",
"profile": "openclaw"
}
A snapshot returns:
- Page title and URL
- All interactive elements with references (like "e12", "e45")
- Text content
- Form fields and their current values
- Links and buttons
Snapshot Output Example
URL: https://example.com/login
Title: Login - Example Site
[e1] textbox "Username"
[e2] textbox "Password"
[e3] button "Log In"
[e4] link "Forgot Password?"
These references (e1, e2, etc.) can be used in subsequent actions to interact with elements.
Interacting with Elements
Clicking Buttons and Links
Use the act action with a click kind:
{
"action": "act",
"profile": "openclaw",
"request": {
"kind": "click",
"ref": "e3"
}
}
This clicks the element referenced as e3 in the most recent snapshot.
Filling Forms
Type into input fields using the type kind:
{
"action": "act",
"profile": "openclaw",
"request": {
"kind": "type",
"ref": "e1",
"text": "myusername"
}
}
For forms with multiple fields, chain actions together:
{
"action": "act",
"profile": "openclaw",
"request": {
"kind": "type",
"ref": "e1",
"text": "myusername"
}
}
Then:
{
"action": "act",
"profile": "openclaw",
"request": {
"kind": "type",
"ref": "e2",
"text": "mypassword"
}
}
Finally:
{
"action": "act",
"profile": "openclaw",
"request": {
"kind": "click",
"ref": "e3"
}
}
Using Fill for Faster Input
The fill kind clears and sets the value instantly without simulating typing:
{
"action": "act",
"profile": "openclaw",
"request": {
"kind": "fill",
"ref": "e1",
"text": "myusername"
}
}
This is faster than type and useful for long form fields.
Extracting Data
Snapshots automatically capture visible text and element properties. For custom data extraction, use the evaluate kind to run JavaScript:
{
"action": "act",
"profile": "openclaw",
"request": {
"kind": "evaluate",
"fn": "() => { return document.querySelectorAll('.price').map(el => el.textContent); }"
}
}
This executes JavaScript in the page context and returns the result. Perfect for extracting structured data from complex pages.
Practical Use Cases
Use Case 1: Monitoring a Dashboard
Check a status dashboard every hour:
{
"action": "open",
"url": "https://dashboard.example.com",
"profile": "openclaw"
}
{
"action": "act",
"profile": "openclaw",
"request": {
"kind": "evaluate",
"fn": "() => { return { status: document.querySelector('.status').textContent, uptime: document.querySelector('.uptime').textContent }; }"
}
}
Use Case 2: Submitting Forms
Automate form submissions:
{
"action": "open",
"url": "https://forms.example.com/submit",
"profile": "openclaw"
}
{
"action": "snapshot",
"profile": "openclaw"
}
After identifying refs from snapshot:
{
"action": "act",
"profile": "openclaw",
"request": {
"kind": "fill",
"ref": "e5",
"text": "John Doe"
}
}
Continue for each field, then submit.
Use Case 3: Scraping Dynamic Content
Many modern websites load content via JavaScript. Static scrapers fail here. OpenClaw's browser automation handles this naturally:
{
"action": "open",
"url": "https://catalog.example.com",
"profile": "openclaw"
}
{
"action": "act",
"profile": "openclaw",
"request": {
"kind": "evaluate",
"fn": "() => { return Array.from(document.querySelectorAll('.product')).map(p => ({ name: p.querySelector('.name').textContent, price: p.querySelector('.price').textContent })); }"
}
}
Use Case 4: Testing Web Applications
Automated testing without writing Selenium or Playwright code:
{
"action": "open",
"url": "https://app.example.com/login",
"profile": "openclaw"
}
{
"action": "snapshot",
"profile": "openclaw"
}
Verify login form appears, then test login flow.
Taking Screenshots
Capture visual state for debugging or verification:
{
"action": "screenshot",
"profile": "openclaw",
"fullPage": true
}
Screenshots are returned as attachments. Use fullPage: false to capture only the visible viewport.
Working with Multiple Tabs
OpenClaw supports multi-tab workflows via targetId:
action: "open"targetId to subsequent actions to specify which tab{
"action": "open",
"url": "https://example.com/page1",
"profile": "openclaw"
}
Response includes targetId: "page-abc123". To work with this specific tab:
{
"action": "snapshot",
"profile": "openclaw",
"targetId": "page-abc123"
}
Handling Dialogs
JavaScript alerts, confirms, and prompts can interrupt automation. Use the dialog action:
{
"action": "dialog",
"profile": "openclaw",
"accept": true,
"promptText": "optional text for prompts"
}
Set accept: false to dismiss the dialog.
Advanced Patterns
Waiting for Elements
Sometimes you need to wait for an element to appear after an action:
{
"action": "act",
"profile": "openclaw",
"request": {
"kind": "wait",
"text": "Success"
}
}
This waits for text "Success" to appear on the page.
Hover Actions
Trigger hover menus or tooltips:
{
"action": "act",
"profile": "openclaw",
"request": {
"kind": "hover",
"ref": "e10"
}
}
Drag and Drop
For drag-and-drop interfaces:
{
"action": "act",
"profile": "openclaw",
"request": {
"kind": "drag",
"startRef": "e5",
"endRef": "e12"
}
}
Best Practices
{
"action": "stop",
"profile": "openclaw"
}
Troubleshooting
Element not found: Take a fresh snapshot. Element refs change if the page updates.
Timeout errors: Increase timeout or change loadState to something less strict.
Stale page: Refresh before taking snapshot:
{
"action": "navigate",
"profile": "openclaw"
}
JavaScript not executing: Ensure page is fully loaded before calling evaluate.
Configuration
OpenClaw's browser tool can be configured in openclaw.json:
{
"browser": {
"headless": true,
"defaultTimeout": 30000,
"viewport": {
"width": 1280,
"height": 720
}
}
}
Set headless: false for debugging to see the browser window.
Conclusion
OpenClaw's agent-browser provides a powerful, AI-friendly interface for web automation. By combining snapshots for understanding page state with actions for interaction, agents can automate complex web workflows without writing traditional scraping code.
The key is thinking in terms of: snapshot (understand), act (interact), evaluate (extract). This pattern handles everything from simple form filling to complex multi-step workflows.
Start with simple navigations and snapshots, then build up to more complex interactions as you get comfortable with the element reference system and action types.