PaceVGGT: Pre-Alternating-Attention Token Pruning for Visual Geometry Transformers

ArXi:2605.08371v1 Announce Type: new Visual Geometry Transformer (VGGT) is a strong feed-forward model for multiple 3D tasks, but its Alternating-Attention (AA) stack scales quadratically in the total token count, making long clips expensive. Existing token-reduction accelerators operate inside AA, leaving the patch grid that enters AA uncompressed. A per-frame keep budget fixes the backbone-visible sequence length, while an importance-adaptive merge/prune assignment preserves residual content from high-saliency frames under a fixed total merge budget.