HCInfer: An Efficient Inference System via Error Compensation for Resource-Constrained Devices

ArXi:2605.05819v1 Announce Type: new LLMs often struggle with memory-constrained deployment on consumer-grade hardware due to their massive parameter sizes. While existing solutions such as model compression and offloading improve deployment feasibility, they often suffer from substantial accuracy degradation or severe throughput bottlenecks.