AI RESEARCH

Built an political benchmark for LLMs. KIMI K2 can't answer about Taiwan (Obviously). GPT-5.3 refuses 100% of questions when given an opt-out. [P]

r/MachineLearning

I spent the few days building a benchmark that maps where frontier LLMs fall on a 2D political compass (economic left/right + social progressive/conservative) using 98 structured questions across 14 policy areas. I tested GPT-5.3, Claude Opus 4.6, and KIMI K2. The results are interesting. The repo is fully open-source -- run it yourself on any model with an API: The headline finding: silence is a political stance Most LLM benchmarks throw away refusals as "missing data." We score them.