From Documents to Spans: Code-Centric Learning for LLM-based ICD Coding

ArXi:2603.15270v1 Announce Type: cross ICD coding is a critical yet challenging task in healthcare. Recently, LLM-based methods nstrate stronger generalization than discriminative methods in ICD coding. However, fine-tuning LLMs for ICD coding faces three major challenges. First, existing public ICD coding datasets provide limited coverage of the ICD code space, restricting a model's ability to generalize to unseen codes. Second, naive fine-tuning diminishes the interpretability of LLMs, as few public datasets contain explicit ing evidence for assigned codes.