AI RESEARCH
SISA: A Scale-In Systolic Array for GEMM Acceleration
arXiv CS.AI
•
ArXi:2603.29913v1 Announce Type: cross The currently dominant AI/ML workloads, such as Large Language Models (LLMs), rely on the efficient execution of General Matrix-Matrix Multiplication (GEMM) operations. Thus, most systems are equipped with dedicated matrix hardware accelerators based on square Systolic Arrays (SAs) of Processing Elements (PEs). While this organization was effective for traditional Deep Neural Networks (DNNs), LLMs