AI RESEARCH

PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents

arXiv CS.CV

ArXi:2512.14735v2 Announce Type: replace-cross This paper proposes PyFi, a novel framework for pyramid-like financial image understanding that enables vision language models (VLMs) to reason through question chains in a progressive, simple-to-complex manner. At the core of PyFi is PyFi-600K, a dataset comprising 600K financial question-answer pairs organized into a reasoning pyramid: questions at the base require only basic perception, while those toward the apex demand increasing levels of capability in financial visual understanding and expertise.