Show HN: LOAB – AI agents get decisions right but skip the process [pdf]

Hacker News (AI)
Generative AI AI Research

LOAB, an open-source benchmark for evaluating whether AI agents can follow regulated lending processes - not just produce the right final answer. The motivation is simple: in mortgage lending, regulators don't care if you got the right answer. They care whether you followed the right process. Skip a KYC check, pull a credit bureau report before getting privacy consent, or approve a loan without the required policy lookup - that's a compliance failure even if the outcome was correct. Current AI benchmarks don't measure this. They evaluate what the agent decided, not how it got there.