Do you still need to describe caption for "Environment, light tone, image style, objects" in Z-image model training?

r/StableDiffusion
Machine Learning

Sorry, I am just come back from old era. I see that Z-image is much followed on command nowadays. A year ago, people told me that I should captive on every detail including human's posture, objects, house, stage, also light and tone. Otherwise, when I mention this person. This person will always come together with same house, same image style that I didn't specific inside. Nowadays, people told me to still do the same using tools like QwenVL to captive everything and as detail as possible.