Big architectural updates for the app's maintainability and flexibility - DinD, GPU autodetection, Model Backup and Restore & other.
LlamaMan is a self-hosted web UI for managing llama.cpp instances. Check out what's new in version 0.8.6
Ollama was too slow and apparently nobody had built a proper UI for managing llama.cpp instances in Docker. So I did. LlamaMan gives you full control over GPU layer offload, real-time VRAM monitoring, one-click model launching from a preset, and an Ollama-compatible proxy. Having control over your own hardware shouldn't be a novel concept in 2026.