Sharing a simple Python script to benchmark LLM inference latency across different providers

Dev.to AI
Generative AI AI Research

Was tinkering with some latency measurements lately and wanted to share a quick Python snippet that might help others evaluating inference endpoints. The goal was simple: send identical prompts to different providers and measure time-to-first-token and total generation time. Nothing fancy, but useful when you're trying to decide where to route production traffic.