Complete Evidence Extraction with Model Ensembles: A Case Study on Medical Coding

ArXi:2511.07055v3 Announce Type: replace-cross High-stakes decisions informed by decision systems require explicit evidence. While prior work focuses on short sufficient evidence, regulatory compliance and medical billing call for complete evidence: all relevant input tokens that a decision. We formulate complete evidence extraction as a task and study it in a medical coding setting. Motivated by the Rashomon effect, we aggregate token-level evidence from multiple language models to increase evidence completeness.