ESL-Bench: An Event-Driven Synthetic Longitudinal Benchmark for Health Agents

ArXi:2604.02834v1 Announce Type: new Longitudinal health agents must reason across multi-source trajectories that combine continuous device streams, sparse clinical exams, and episodic life events - yet evaluating them is hard: real-world data cannot be released at scale, and temporally grounded attribution questions seldom admit definitive answers without structured ground truth.