Designed a photonic chip for O(1) KV cache block selection — 944x faster, 18,000x less energy than GPU scan at 1M context
r/LocalLLaMA
•
AI Hardware
I’m a nanophotonics PhD student, and I think photonic chips can solve the KV cache scanning bottleneck. Block-sparse methods like Quest/RocketKV reduce blocks fetched, but still scan all N block signatures from HBM every decode step. That scan is O(N) - at 1M context on H100, it’s ~8.5μs per query. In batch serving this becomes the dominant cost. PRISM replaces the scan with optical broadcast: query encoded as light → split to all N blocks simultaneously via passive splitter → each block’s signature d as MRR weights → all similarity scores computed at once. O(1) regardless of N.