AIfred Intelligence benchmarks: 9 models debating "Dog vs Cat" in multi-agent tribunal — quality vs speed across 80B-235B (AIfred with upper "I" instead of lower "L" :-)

Hey r/LocalLLaMA, Some of you might remember [my post from New Year's] about AIfred Intelligence - the self-hosted AI assistant with multi-agent debates, web research and voice interface. I promised model benchmarks back then. Here they are! What I did: I ran the same question - "What is better, dog or cat?" - through AIfred's Tribunal mode across 9 different models. In Tribunal mode, AIfred (the butler) argues his case, then Sokrates (the philosopher) tears it apart, they go 2 rounds, and finally Salomo (the judge) delivers a verdict. 18 sessions total, both in German and English.