3x Faster Video Inference Without Touching the Model

Sometimes computer vision model inference runs so fast that it is not even close to being the bottleneck. Let’s discuss an inference of D-FINE “s” model on a video, where the bottlenecks are and how to speed things up. I’ll share some concepts and code. All experiments were run within the D-FINE-seg framework and can be reproduced. Check out D-FINE-seg repo for that, specifically this script. Note that I use D-FINE just as example, but talk about a generic approach that can be applied to other models.