AI RESEARCH

ODMA: On-Demand Memory Allocation Strategy for LLM Serving on LPDDR-Class Accelerators

arXiv CS.AI

ArXi:2512.09427v3 Announce Type: replace-cross Existing memory management techniques severely hinder efficient Large Language Model serving on accelerators constrained by poor random-access bandwidth. While static pre-allocation preserves memory contiguity,it incurs significant overhead due to worst-case provisioning. Conversely,fine-grained paging mitigates this overhead but relies on HBM's high random-access tolerance, making it unsuitable for LPDDR systems where non-sequential access rapidly degrades bandwidth.