About
This skill enables Claude to manage local LLM deployments using Mozilla Llamafile, a cross-platform format for running Large Language Models without cloud dependencies. It provides comprehensive guidance for installing binaries, selecting optimized GGUF models, and configuring servers with GPU acceleration for CUDA, Metal, or Vulkan. Whether building air-gapped tools, troubleshooting server connections, or integrating local inference into developer workflows via LiteLLM or the OpenAI SDK, this skill ensures a seamless, offline-first AI environment.