Forgive my ignorance but how is a 27B model better than 397B?

Is Qwen just incredibly good at doing dense and not so good at doing MoE? I get that dense is generally better than MoE but 27B being better than 397B just doesn’t sit right with me. What are those additional experts even doing then? submitted by /u/No_Conversation9561 [link] [comments]