010 GitHub stars
02AI Agent Control: Send keyboard input and capture real JPEG/PNG screenshots
03Enhanced OCR: Get text with precise bounding boxes and grid cell locations
04Grid-Based Clicking: Click at grid references (e.g., "K9") instead of pixel coordinates
05Console Access: Connect to VM consoles via noVNC through Proxmox API
06Serial Console Access: Connect to VM serial console (xterm.js backend) for raw text I/O