AI RESEARCH
PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents
arXiv CS.CV
•
ArXi:2512.14735v2 Announce Type: replace-cross This paper proposes PyFi, a novel framework for pyramid-like financial image understanding that enables vision language models (VLMs) to reason through question chains in a progressive, simple-to-complex manner. At the core of PyFi is PyFi-600K, a dataset comprising 600K financial question-answer pairs organized into a reasoning pyramid: questions at the base require only basic perception, while those toward the apex demand increasing levels of capability in financial visual understanding and expertise.