DGX Spark just arrived — planning to run vLLM + local models, looking for advice

Just got a DGX Spark set up today and starting to configure it for local LLM inference. Plan is to run: • vLLM • PyTorch • Hugging Face models as a local API backend for an application I’m building (education / analytics use case, trying to keep everything local/private). I’ve mostly been working with cloud GPUs up to now, so this is my first time running something like this fully on-prem.