qwen3_4b_fp8_scaled vs. z_image_turbo_fp8_e4m3fn and flux-2-klein-4b-fp8

Can anyone explain the following to me then tell me if there is something I can do to decrease the time it takes to process prompt before sending it to Ksampler? Z Turbo is not an issue in this case, yet Flux 2 Klein 4b is. The first thing to note, no matter how you look at it, the text encoder simply won't fit into vram on my system. Yet this same text encoder that both Z Turbo and Flux 2 Klein 4b uses, qwen3_4b_fp8_scaled.safetensors, processes the prompt in Z Turbo considerably faster than it does in Flux 2 Klein 4B on my hardware.