AI RESEARCH
Probabilistic Language Tries: A Unified Framework for Compression, Decision Policies, and Execution Reuse
arXiv CS.LG
•
ArXi:2604.06228v1 Announce Type: new The central technical result is a prior-guided caching theorem: under a stationary generative distribution, a PLT-guided cache achieves strictly lower expected inference cost than any empirical-frequency cache for all query counts below a threshold that grows with the concentration of the prior. This converts O(n^2) transformer attention cost into an expected cost of p_r * O(log N) + (1 - p_r) * O(n^2), where p_r is the prior-estimated reuse probability and N is the artifact size.