AI RESEARCH
ATP-Bench: Towards Agentic Tool Planning for MLLM Interleaved Generation
arXiv CS.AI
•
ArXi:2603.29902v1 Announce Type: new Interleaved text-and-image generation represents a significant frontier for Multimodal Large Language Models (MLLMs), offering a intuitive way to convey complex information. Current paradigms rely on either image generation or retrieval augmentation, yet they typically treat the two as mutually exclusive paths, failing to unify factuality with creativity.