AI RESEARCH

BioGraphletQA: Knowledge-Anchored Generation of Complex QA Datasets

arXiv CS.CL

ArXi:2604.26048v1 Announce Type: new This paper presents a principled and scalable framework for systematically generating complex Question Answering (QA) data. In the core of this framework is a graphlet-anchored generation process, where small subgraphs from a Knowledge Graph (KG) are used in a structured prompt to control the complexity and ensure the factual grounding of questions generated by Large Language Models. The first instantiation of this framework is BioGraphletQA, a new biomedical KGQA dataset of 119,856 QA pairs.