ARC AGI 3 scores are not calculated the same way as ARC AGI 1 or 2
r/singularity
•
Generative AI
AI Research
AI Tools
Their paper: On page 11: This scoring function is called RHAE (Relative Human Action Efficiency), pronounced “Ray”. The procedure can be summarized as follows: • “Score the AI test taker by its per-level action efficiency” - For each level that the test taker completes, count the number of actions that it took. • “As compared to human baseline” - For each level that is counted, compare the AI agent’s action count to a human baseline, which we define as the second-best human action action.