SemanticDialect: Semantic-Aware Mixed-Format Quantization for Video Diffusion Transformers

ArXi:2603.02883v2 Announce Type: replace Diffusion Transformers (DiT) achieve strong video generation quality, but their memory and compute costs hinder edge deployment. Quantization can reduce these costs, yet existing methods often degrade video quality under high activation variation and the need to preserve semantic/temporal coherence.