Quantifying Memorization and Privacy Risks in Genomic Language Models

ArXi:2603.08913v1 Announce Type: new Genomic language models (GLMs) have emerged as powerful tools for learning representations of DNA sequences, enabling advances in variant prediction, regulatory element identification, and cross-task transfer learning. However, as these models are increasingly trained or fine-tuned on sensitive genomic cohorts, they risk memorizing specific sequences from their