AI RESEARCH
Bridging the Semantic Gap for Categorical Data Clustering via Large Language Models
arXiv CS.AI
•
ArXi:2601.01162v2 Announce Type: replace-cross Categorical data are prevalent in domains such as healthcare, marketing, and bioinformatics, where clustering serves as a fundamental tool for pattern discovery. A core challenge in categorical data clustering lies in measuring similarity among attribute values that lack inherent ordering or distance. Without appropriate similarity measures, values are often treated as equidistant, creating a semantic gap that obscures latent structures and degrades clustering quality.