AI RESEARCH
QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL
arXiv CS.LG
•
ArXi:2605.01862v1 Announce Type: new Offline goal-conditioned RL (GCRL) learns goal-reaching policies from static datasets, but real-world datasets are often partially observable and history-dependent, exhibiting a mix of Markovian and non-Markovian that violate standard RL assumptions. History-aware sequence models such as Decision Transformer (DT) are a natural fit for long-term dependency modeling, yet pure attention is inefficient and brittle when handling local Markovian structure and long-range context simultaneously. Although recent hybrid architectures (e.g.