[D] Tested model routing on financial AI datasets — good savings and curious what benchmarks others use.

Ran a benchmark evaluating whether prompt complexity-based routing delivers meaningful savings. Used public HuggingFace datasets. Here's what I found. Setup Baseline: Claude Opus for everything. Tested two strategies: Intra-provider - routes within same provider by complexity. Simple → Haiku, Medium → Sonnet, Complex → Opus Flexible - medium prompts go to self-hosted Qwen 3.5 27B / Gemma 3 27B.