FlashDrive: Flash Vision-Language-Action Inference For Autonomous Driving (8 minute read)
TLDR AI
•
Robotics
AI Research
FlashDrive is an algorithm-system co-design framework that reduces end-to-end latency to 159ms with negligible accuracy loss. VLA inference is a cascade of stages, each hiding a different form of redundancy. Temporal overlap in vision, low entropy in reasoning, velocity smoothness in flow matching, and numerical headroom in weights each yield to a targeted shortcut. The speedups compound to 4.5x with negligible accuracy loss because the redundancies are orthogonal.