Lessons from building a permanent companion agent on local hardware

r/LocalLLaMA
Generative AI AI Research

I've been running a self-hosted agent on an M4 Mac mini for a few months now and wanted to share some things I've learned that I don't see discussed much. The setup: Rust runtime, qwen2.5:14b on Ollama for fast local inference, with a model ladder that escalates to cloud models when the task requires it. SQLite memory with local embeddings (nomic-embed-text) for semantic recall across sessions. The agent runs 24/7 via launchd, monitors a trading bot, checks email, deploys websites, and delegates heavy implementation work to Claude Code through a task runner.