About
This skill provides standardized implementation patterns for managing shared GPU resources across multiple AI services such as Ollama, Whisper, and ComfyUI. It addresses the common Out-of-Memory (OOM) bottleneck by implementing sophisticated retry loops, configurable idle timeouts for model unloading, and a signaling protocol that allows services to request VRAM clearance from one another. It is particularly useful for developers running multiple local AI models on a single GPU who need to ensure stable, automated handovers without manual intervention.