AI RESEARCH

EdgeCIM: A Hardware-Software Co-Design for CIM-Based Acceleration of Small Language Models

arXiv CS.AI

ArXi:2604.11512v1 Announce Type: cross The growing demand for deploying Small Language Models (SLMs) on edge devices, including laptops, smartphones, and embedded platforms, has exposed fundamental inefficiencies in existing accelerators. While GPUs handle prefill workloads efficiently, the autoregressive decoding phase is dominated by GEMV operations that are inherently memory-bound, resulting in poor utilization and prohibitive energy costs at the edge.