How do you actually pick which GPU to rent for inference?

Every time I need to spin up a vLLM workload I end up with 6 tabs open, RunPod, Vast.ai, Lambda, random benchmark threads, trying to figure out what will actually fit in VRAM and what it'll cost. Feels like there should be a better way but I haven't found it. What do you use? Any tools that actually help, or is it just vibes and trial and error until something OOMs? submitted by /u/Major_Border149 [link] [comments]