Build, Judge, Optimize: A Blueprint for Continuous Improvement of Multi-Agent Consumer Assistants

ArXi:2603.03565v2 Announce Type: replace Conversational shopping assistants (CSAs) represent a compelling application of agentic AI, but moving from prototype to production reveals two underexplored challenges: how to evaluate multi-turn interactions and how to optimize tightly coupled multi-agent systems. Grocery shopping further amplifies these difficulties, as user requests are often underspecified, highly preference-sensitive, and constrained by factors such as budget and inventory.