AI RESEARCH
A Heterogeneous Two-Stream Framework for Video Action Recognition with Comparative Fusion Analysis
arXiv CS.CV
•
ArXi:2604.23415v1 Announce Type: new Most two-stream action recognition networks apply the same convolutional backbone to both RGB and optical flow streams, ignoring the fact that the two modalities have fundamentally different structural properties. Optical flow captures fine-grained motion patterns, while RGB frames carry rich appearance and scene context - treating them identically discards this distinction.