About
The RAG Evaluation Skill provides a comprehensive framework for auditing and optimizing Retrieval-Augmented Generation pipelines directly within Claude Code. It enables developers to measure retrieval accuracy using standard metrics like Recall@K and MRR, while assessing generation quality through LLM-as-judge scoring for faithfulness and relevance. Whether you are testing local configurations or benchmarking against production-grade APIs like Ailog, this skill helps identify retrieval bottlenecks, hallucination risks, and latency issues to ensure high-quality AI responses.