AI RESEARCH

Set-Aggregated Genome Embeddings for Microbiome Abundance Prediction

arXiv CS.AI

ArXi:2605.12286v1 Announce Type: cross Microbiome functions are encoded within the genes of the community-wide metagenome. A natural question is whether properties of a microbial community can be predicted just from knowing the raw DNA sequences of its members. In this work, we employ set-aggregated genome embeddings (SAGE) to predict community-level abundance profiles, exploiting the few-shot learning capabilities of genomic language models (GLMs). We benchmark this approach to show improved generalization on novel genomes compared to classical bioinformatics approaches.