About
This skill provides a production-ready framework for managing 'golden datasets'—the curated ground-truth examples used to benchmark RAG systems, evaluate search quality, and prevent regressions. It automates critical data protection tasks including JSON-based backups for version control, content validation via URL contracts, and disaster recovery workflows. By separating content from embeddings, it ensures your evaluation data remains portable and reproducible across different environments and model versions, making it an essential tool for AI engineers focused on retrieval quality and system reliability.