Tessera: Unlocking Heterogeneous GPUs through Kernel-Granularity Disaggregation

ArXi:2604.10180v1 Announce Type: cross Disaggregation maps parts of an AI workload to different types of GPUs, offering a path to utilize modern heterogeneous GPU clusters. However, existing solutions operate at a coarse granularity and are tightly coupled to specific model architectures, leaving much room for performance improvement. This paper presents Tessera, the first kernel disaggregation system to improve performance and cost efficiency on heterogeneous GPUs for large model inference.