AI RESEARCH

Prototype Guided Post-pretraining for Single-Cell Representation Learning

arXiv CS.LG

ArXi:2605.07938v1 Announce Type: new Single-cell representation learning (SCRL) from gene expression data offers a way to uncover the complex regulatory logic underlying cellular function. Inspired by large language models in natural language modeling, several single-cell pretrained models have recently been proposed that treat genes as tokens and cells as sentences. However, these models are fundamentally limited by the long-tailed nature of cell-type distributions and struggle to generalize under covariate shifts in gene expression data.