LogiStory: A Logic-Aware Framework for Multi-Image Story Visualization

ArXi:2603.28082v1 Announce Type: new Generating coherent and communicative visual sequences, such as image sequences and videos, remains a significant challenge for current multimodal systems. Despite advances in visual quality and the integration of world knowledge, existing models still struggle to maintain logical flow, often resulting in disjointed actions, fragmented narratives, and unclear storylines.