NERdME: a Named Entity Recognition Dataset for Indexing Research Artifacts in Code Repositories

ArXi:2603.05750v1 Announce Type: new Existing scholarly information extraction (SIE) datasets focus on scientific papers and overlook implementation-level details in code repositories. README files describe datasets, source code, and other implementation-level artifacts, however, their free-form Markdown offers little semantic structure, making automatic information extraction difficult. To address this gap, NERdME is