AI Native Daily Paper Digest – 20251016

1. UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
🔑 Keywords: AI-generated summary, Dynamic-Capacity Mixture-of-Experts, cross-domain synergy, speech and music generation, data imbalance
💡 Category: Generative Models
🌟 Research Objective:
– The objective is to address the challenge of developing a unified speech and music generation model by tackling task conflicts and data imbalance, enhancing cross-domain synergy.
🛠️ Research Methods:
– Utilized a Dynamic-Capacity Mixture-of-Experts framework with a Top-P routing strategy for expert allocation and a hybrid expert design to manage domain-specific and domain-agnostic tasks.
– Implemented a three-stage training curriculum to manage data imbalance and improve model performance, including Independent Specialist Training, MoE Integration and Warmup, and Synergistic Joint Training.
💬 Research Conclusions:
– The UniMoE-Audio model achieves state-of-the-art performance in speech and music generation, demonstrating the potential of specialized MoE architecture and curated training strategies in universal audio generation.
👉 Paper link: https://huggingface.co/papers/2510.13344
