PSA: llama-swap released a new grouping feature, matrix, allowing you to fine tune which models can run together

Previously a model could only be present in a single group. Now you can create whatever groups you want: one for big models that should run on their own, a group for STT + bigger model, a group for RAG usages, etc. It'll intelligently unload models based on "cost" of doing so. Check out the config: llama-swap/config.example.yaml at main · mostlygeek/llama-swap # # matrix: run concurrent models with a solver-based swap DSL # # # Note: # A config must use either a matrix or legacy groups, not both. A configuration error # will occur if both are defined.