Evaluates large language models' meta-level reasoning within the interactive puzzle game "Baba Is You".
Sponsored
Baba Is Eval provides an innovative platform for assessing the meta-level reasoning capabilities of large language models in a dynamic, interactive setting. Leveraging the unique rule-manipulation mechanics of the puzzle game "Baba Is You," it allows language models to interact with the game by manipulating word blocks and forming rules through text commands. The project includes an MCP server, exposing a suite of tools for models to receive game states, execute actions, and control game flow, effectively turning the game into a challenging and engaging testbed for advanced AI evaluation.
Key Features
01Integration with MCP (Meta-Level Control Protocol) servers
02Ability to undo multiple previous actions
03Execution of chained in-game movement commands
040 GitHub stars
05Programmatic level entry and exit
06Dynamic retrieval of game state as a matrix
Use Cases
01Evaluating large language models' meta-level reasoning
02Benchmarking AI performance in interactive puzzle-solving
03Developing and testing AI agents for complex rule-manipulation environments