Agent Evaluation: LLM Benchmarking Claude Code Skill