40s generation time for 10s vid on a 5090 using custom runtime (ltx 2.3) (closed project, will open source soon)

Heya! just wanted to share a milestone. context: this is an inference engine written in rust™. right now the denoise stage is fully rust-native, and i’ve also been working on the surrounding bottlenecks, even though i still use a python bridge on some colder paths. this raccoon clip is a raw test from the current build. by bypassing python on the hot paths and doing some aggressive memory management, i'm getting full 10s generations in under 40 seconds! i started with LTX-2 and i'm currently tweaking the pipeline so LTX-2.3 fits and runs smoothly.