AI RESEARCH

The Structured Output Benchmark (SOB) - validates both JSON parse and value accuracy [R]

r/MachineLearning

Current structured output benchmarks only validate pass rate for json schema and types, however commonly the issue tends to be inaccurate json values. For example hallucinated `total_price` number when extracting value from a invoice or an array ordered wrongly because of inaccurate date mapping. The Structured output benchmark measures 7 key metrics instead of json schema.