About
This skill provides a comprehensive framework for tracking the lifecycle of failures within AI agent workflows. It enables developers to distinguish between transient and permanent errors, monitor retry success rates, visualize fallback paths, and manage rate limits effectively. By providing deep insights into failure patterns and recovery behaviors, it helps optimize LLM performance, reduce wasted token costs, and implement resilient circuit breaker patterns for more stable production deployments.