01LLM-friendly: Does not require computer vision models for accessibility-based interactions.
02Fast and lightweight: Utilizes native accessibility trees for most interactions, falling back to screenshot-based coordinates when necessary.
03Visual Sense: Evaluates and analyzes on-screen content to determine optimal next actions.
04Deterministic tool application: Reduces ambiguity by prioritizing structured data whenever possible.
05Extract structured data: Capable of extracting visible structured data from the screen.
060 GitHub stars