I benchmarked three LLM inference providers this week and one route surprised me

Title: I benchmarked three LLM inference providers this week and one route surprised me Body: I've been running some personal benchmarks comparing inference latency across a few different API providers for a side project I'm tinkering with. The goal was dead simple: send identical prompts, measure time-to-first-token and tokens-per-second, see what shakes out. One setup I tried that I didn't expect much from was a relatively new endpoint I stumbled across.