Beyond Augmented-Action Surrogates for Multi-Expert Learning-to-Defer

ArXi:2604.09414v1 Announce Type: cross Learning-to-Defer routes each input to the expert that minimizes expected cost, but it assumes that the information available to every expert is fixed at decision time. Many modern systems violate this assumption: after selecting an expert, one may also choose what additional information that expert should receive, such as retrieved documents, tool outputs, or escalation context. We study this problem and call it Learning-to-Defer with advice.