AI RESEARCH
Span Modeling for Idiomaticity and Figurative Language Detection with Span Contrastive Loss
arXiv CS.CL
•
ArXi:2603.22799v1 Announce Type: new The category of figurative language contains many varieties, some of which are non-compositional in nature. This type of phrase or multi-word expression (MWE) includes idioms, which represent a single meaning that does not consist of the sum of its words. For language models, this presents a unique problem due to tokenization and adjacent contextual embeddings. Many large language models have overcome this issue with large phrase vocabulary, though immediate recognition frequently fails without one- or few-shot prompting or instruction finetuning.