AI RESEARCH

DeepCoT: Deep Continual Transformers for Real-Time Inference on Data Streams

arXiv CS.CL

ArXi:2511.17693v2 Announce Type: replace-cross Transformer-based models have dramatically increased their size and parameter count to tackle increasingly complex tasks. At the same time, there is a growing demand for high performance, low-latency inference on devices with limited resources. In particular, stream data inference is typically performed over a sliding temporal window, leading to highly redundant computations.