Discover Agent Skills for analytics & monitoring. Browse 47 skills for Claude, ChatGPT & Codex.
Refactors complex Go codebases using automated metrics and BAIME-aligned protocols to reduce cyclomatic complexity and improve test coverage.
Transforms subjective code debt into objective, data-driven paydown strategies using the industry-standard SQALE methodology.
Analyzes architectural health using the Balanced Coupling framework to generate interactive dependency reports and refactoring insights.
Automates the detection, classification, and resolution of system errors using a 13-category taxonomy and systematic recovery patterns.
Implements the three pillars of observability—structured logging, metrics, and distributed tracing—to provide deep visibility into application health and performance.
Implements a systematic error-handling methodology using a 13-category taxonomy to diagnose, recover from, and prevent session failures.
Implements comprehensive logging, metrics, and distributed tracing to ensure production reliability and performance monitoring.
Manages production incidents using SRE methodologies for rapid investigation, mitigation, and postmortem documentation.
Implements production-grade observability for Cloudflare Workers using structured logging, real-time log streaming, and custom performance metrics.
Analyzes and optimizes application performance across algorithms, databases, and frontend frameworks to resolve bottlenecks and reduce latency.
Implements production-grade monitoring, logging, and tracing systems to ensure application reliability and performance.
Diagnoses complex software errors using automated stack trace analysis and systematic root cause investigation.
Simplifies GTM implementations by providing expert guidance on tags, triggers, variables, and data layer configurations.
Validates Prometheus metrics implementation in Go applications to ensure optimal observability and performance.
Profiles and optimizes OCaml memory allocations to eliminate boxing overhead and improve application performance.
Implements comprehensive request tracking across microservices using Jaeger and Tempo to identify performance bottlenecks and service dependencies.
Identifies, quantifies, and prioritizes technical debt within codebases using ROI-based remediation plans.
Configures Prometheus for comprehensive metric collection, alerting, and observability across infrastructure and applications.
Implements service reliability targets using SLIs, SLOs, and error budgets to balance innovation velocity with system stability.
Implements comprehensive Kafka monitoring using Prometheus and Grafana to track cluster health, consumer lag, and broker performance.
Guides the configuration and troubleshooting of dbt MCP server connections for AI-powered data engineering and analytics workflows.
Diagnoses and resolves failures in dbt Cloud and platform jobs using systematic workflows and root cause analysis.
Answers complex business questions by intelligently navigating dbt semantic layers, models, and project artifacts.
Automates the creation and management of production-grade Grafana dashboards for real-time system and application observability.
Automates error capture, intelligent batching, and structured logging to streamline AI agent recovery and orchestration.
Monitors Hawk job status, retrieves logs, and diagnoses issues for AI evaluation runs within the UK AISI Inspect framework.
Standardizes incident management processes, from initial detection and triage to postmortem analysis and reliability improvements.
Guides the definition of reliability targets, selection of service indicators, and implementation of error budget policies.
Architects robust, hook-based event systems to monitor and broadcast real-time AI agent activities within Claude Code workflows.
Implements and debugs request flows across microservices using OpenTelemetry standards and distributed tracing patterns.
Scroll for more results...