Need clarification on how QWEN Image Edit likes it's input images formatted for Ref / VAE and VL
r/StableDiffusion
•
Open Source AI
AI Research
About half my inputs are a single person standing. 512x1152 is quit common for me after I crop out dead space. I'm having trouble finding out how picky the VAE and VL are about dimensions and my testing hasn't really helped. For the REF image, I just make sure height and width are both divisible by 64 and the total pixel count is equal to or less than 1MP. So that 512x1152 would just be left as-is. Or should I be padding it and scaling to exactly 1024x1024. Or upscaling the 512x1152 to be exactly 1MP? Then for VL I have it at 384 with no crop.