AI RESEARCH
Learning Multimodal Energy-Based Model with Multimodal Variational Auto-Encoder via MCMC Revision
arXiv CS.AI
•
ArXi:2605.00644v1 Announce Type: cross Energy-based models (EBMs) are a flexible class of deep generative models and are well-suited to capture complex dependencies in multimodal data. However, learning multimodal EBM by maximum likelihood requires Marko Chain Monte Carlo (MCMC) sampling in the joint data space, where noise-initialized Langevin dynamics often mixes poorly and fails to discover coherent inter-modal relationships. Multimodal VAEs have made progress in capturing such inter-modal dependencies by