01Support for cloud-based Langfuse LLM-as-judge prompts
02Side-by-side run comparison and failure analysis tools
03Integration for custom local Python evaluator scripts
04Automated experiment execution against Langfuse datasets
050 GitHub stars
06Configurable concurrency for high-volume evaluation tasks