LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs

ArXi:2511.06174v2 Announce Type: replace-cross The rapid development of large language models (LLM) has greatly enhanced everyday applications. While many FPGA-based accelerators, with flexibility for fine-grained data control, exhibit superior speed and energy efficiency compared to GPUs, recent GPU-specific optimizations have diminished this advantage. When limited to arithmetic-based computation, FPGAs often underperform GPUs due to their comparatively fewer computational resources.