From Token to Token Pair: Efficient Prompt Compression for Large Language Models in Clinical Prediction

ArXi:2605.11774v1 Announce Type: cross By processing electronic health records (EHRs) as natural language sequences, large language models (LLMs) have shown potential in clinical prediction tasks such as mortality prediction and phenotyping. However, longitudinal or highly frequent EHRs often yield excessively long token sequences that result in high computational costs and even reduced performance. Existing solutions either add modules for compression or remove less important tokens, which