People pissed about arc agi 3 are really looking at the purpose of the benchmark wrong
r/singularity
•
Generative AI
No, it's not meant to make ai model look dumb. The prompts given to the AI were pretty much exact same as given to humans. to just do the test and try to complete it. Humans weren't told to use the least amount of steps either. And even then, when we have the prompt engineering and harness going on around right now, the improvements aren't substantial. The purpose of the bench mark was to test if SOTA models reached their definition of agi. Whether it was given stronger prompts or harnesses, it will fail either way.