AI RESEARCH

Semantic Centroids and Hierarchical Density-Based Clustering for Cross-Document Software Coreference Resolution

arXiv CS.CL

ArXi:2603.24246v1 Announce Type: new This paper describes the system submitted to the SOMD 2026 Shared Task for Cross-Document Coreference Resolution (CDCR) of software mentions. Our approach addresses the challenge of identifying and clustering inconsistent software mentions across scientific corpora. We propose a hybrid framework that combines dense semantic embeddings from a pre-trained Sentence-BERT model, Knowledge Base (KB) lookup strategy built from