01Inference performance benchmarking (tokens/sec and eval timing)
02Real-time monitoring for NVIDIA (SMI) and AMD (ROCm) hardware
03Detailed VRAM tracking per model via Ollama API integration
04Automated GPU health checks and container troubleshooting
05Active utilization monitoring during live inference sessions
060 GitHub stars