I compared harrier-27b vs voyage-4 vs zembed-1 across 24 datasets. 27B parameters

I've been running embedding model evals for a while now, and Microsoft's Harrier family dropped a new model. btw harrier-27b hit on binary MTEB at launch. That's not nothing. So I put it through the same graded evaluation pipeline I use for everything else - 24 datasets, three independent LLM judges, continuous relevance scores 0-10. No binary pass/fail. The global numbers Model NDCG Recall zembed-1 0.701 0.750 voyage-4 0.699 0.731 harrier-27b 0.699 0.728 On NDCG, it's basically a three-way tie at the top. harrier-27b is legitimately competitive I won't pretend otherwise.