AI RESEARCH
SOLARIS: Speculative Offloading of Latent-bAsed Representation for Inference Scaling
arXiv CS.LG
•
ArXi:2604.12110v1 Announce Type: new Recent advances in recommendation scaling laws have led to foundation models of unprecedented complexity. While these models offer superior performance, their computational demands make real-time serving impractical, often forcing practitioners to rely on knowledge distillation-compromising serving quality for efficiency. To address this challenge, we present SOLARIS (Speculative Offloading of Latent-bAsed Representation for Inference Scaling), a novel framework inspired by speculative decoding.