AI RESEARCH

QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL

arXiv CS.LG

ArXi:2605.01862v1 Announce Type: new Offline goal-conditioned RL (GCRL) learns goal-reaching policies from static datasets, but real-world datasets are often partially observable and history-dependent, exhibiting a mix of Markovian and non-Markovian that violate standard RL assumptions. History-aware sequence models such as Decision Transformer (DT) are a natural fit for long-term dependency modeling, yet pure attention is inefficient and brittle when handling local Markovian structure and long-range context simultaneously. Although recent hybrid architectures (e.g.