AI Native Foundation

1. UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE

🔑 Keywords: AI-generated summary, Dynamic-Capacity Mixture-of-Experts, cross-domain synergy, speech and music generation, data imbalance

💡 Category: Generative Models

🌟 Research Objective:

– The objective is to address the challenge of developing a unified speech and music generation model by tackling task conflicts and data imbalance, enhancing cross-domain synergy.

🛠️ Research Methods:

– Utilized a Dynamic-Capacity Mixture-of-Experts framework with a Top-P routing strategy for expert allocation and a hybrid expert design to manage domain-specific and domain-agnostic tasks.

– Implemented a three-stage training curriculum to manage data imbalance and improve model performance, including Independent Specialist Training, MoE Integration and Warmup, and Synergistic Joint Training.

💬 Research Conclusions:

– The UniMoE-Audio model achieves state-of-the-art performance in speech and music generation, demonstrating the potential of specialized MoE architecture and curated training strategies in universal audio generation.

👉 Paper link: https://huggingface.co/papers/2510.13344

AI Native Daily Paper Digest – 20251016

1. UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE

About

Ecosystem

Insights

Legal