BrowserControl provides AI agents with full web browsing capabilities, enabling them to truly see, click, type, and interact with any website like a human. This MCP server distinguishes itself by employing a 'vision-first' approach, inspired by Google's AntiGravity IDE, where interactive elements on screenshots are annotated with numbered boxes. This 'Set of Marks' (SoM) system allows AI to perform actions using simple, token-efficient commands (e.g., 'click(5)'), significantly reducing costs and increasing speed compared to methods relying on full DOM trees or base64 screenshots. It also features built-in developer tools, session recording, and persistent sessions for a robust and efficient browsing experience.
Key Features
01Vision-First Interaction (Set of Marks)
02High Token Efficiency (Lower Cost & Faster Actions)
03Built-in Developer Tools (Console, Network, Errors)
041 GitHub stars
05Session Recording and Playback
06Persistent Browser Sessions (Cookies, Local Storage)