01Multiple model backends including transformers, bitsandbytes, AutoGPTQ, and llama.cpp.
02Offers an OpenAI-compatible API for Llama 2 models, enabling use with existing clients.
031,958 GitHub stars
04Provides `llama2-wrapper` for seamless integration as a local Llama 2 backend for generative agents/apps.
05Cross-platform compatibility for running on GPU or CPU across Linux, Windows, and Mac.
06Supports all Llama 2 models (7B, 13B, 70B, GPTQ, GGML, GGUF, CodeLlama) with 8-bit and 4-bit inference.