China AI Native Industry Insights - 20250508 - StepFun Technology | Tencent | Robot Era | Tsinghua University

Explore ACE-Step’s new 3.5B music AI model, Tencent’s innovative text-to-image generation with Yuanbao, and Tsinghua’s open-source robotics platform, VPP ‘Sora’. Discover more in Today’s China AI Native Industry Insights.

1. ACE-Step: Jumpstep and ACE Studio Open Source 3.5B Music AI Model

🔑 Key Details:
– 3.5B Parameter Model: ACE-Step supports music creation in 19 languages with high-quality generation in as little as 15 seconds.
– Advanced Capabilities: Features precise editing, retake/repaint functions, and supports both vocal and instrumental music generation.
– Technical Innovation: Uses single-stage DiT architecture with REPA and combines DCAE with linear Transformer for faster generation.
– Flexible Extension: Supports LoRA and ControlNet fine-tuning for style customization and accompaniment generation.

💡 How It Helps:
– Music Creators: Offers multilingual songwriting tools with precise lyric control while maintaining melody integrity.
– AI Developers: Provides extensible foundation model with lower barriers for music AI application development.
– Content Producers: Enables exact-length music generation for ads and media without additional editing.
– Cross-cultural Artists: Facilitates creation in 19 languages for global audience targeting.

🌟 Why It Matters:
ACE-Step represents a potential “Stable Diffusion moment” for AI music generation, combining technical innovation with practical creative tools. The partnership between Jumpstep and ACE Studio demonstrates how specialized domain expertise can enhance multimodal AI development, potentially transforming music creation accessibility while maintaining professional quality standards.

Original Chinese article: https://mp.weixin.qq.com/s/H4K9mPxIrcT1uF4mo5Q0dg

English translation via free online service: https://translate.google.com/translate?hl=en&sl=zh-CN&tl=en&u=https%3A%2F%2Fmp.weixin.qq.com%2Fs%2FH4K9mPxIrcT1uF4mo5Q0dg

Video Credit: ACE Studio (@ACEStudio_en on X)

2. Tencent’s Yuanbao App Adds DeepSeek Text-to-Image Generation

🔑 Key Details:
– New Feature: Tencent’s Yuanbao app now supports text-to-image generation with both MixtralAI and DeepSeek models.
– Prompting Enhancement: The system automatically expands simple user prompts into more detailed descriptions before generating images.
– User Experience: Simple one-sentence descriptions can create high-quality images in various styles and aspect ratios.
– Quality Improvement: The latest MixtralAI text-to-image model offers better text-image consistency and overall image quality.

💡 How It Helps:
– Creative Professionals: Quick generation of concept art, illustrations, and design mockups from simple text descriptions.
– Marketers: Easy creation of promotional visuals without requiring design expertise or complex prompt engineering.
– Product Designers: Rapid visualization of product concepts, as demonstrated with the air purifier example.
– Casual Users: Accessible image creation without needing technical knowledge of prompting techniques.

🌟 Why It Matters:
Tencent’s integration of multiple AI models into their Yuanbao platform represents a significant democratization of image generation technology. By removing the barrier of prompt engineering expertise, Tencent makes advanced AI creativity accessible to everyday users. This multi-model approach also positions Yuanbao as a versatile creative tool in China’s competitive AI landscape, allowing users to leverage different models’ strengths without switching platforms.

Original Chinese article: https://mp.weixin.qq.com/s/2FG1fRPl2cL7ZjWKLTzqtg

English translation via free online service: https://translate.google.com/translate?hl=en&sl=zh-CN&tl=en&u=https%3A%2F%2Fmp.weixin.qq.com%2Fs%2F2FG1fRPl2cL7ZjWKLTzqtg

Video Credit: The original article

3. Tsinghua and Robot Era Open-Source VPP: The ‘Sora’ of Robotics

🔑 Key Details:
– ICML 2025 Spotlight Recognition: VPP (Video Prediction Policy), developed by Tsinghua University and Robot Era, selected among just 2.6% of 12,000+ submissions.
– Video Diffusion Approach: Utilizes internet video data to learn human actions, reducing dependence on high-quality robot training data.
– Real-time Prediction: Solves diffusion inference speed challenges, enabling robots to predict and execute actions in real-time.
– Cross-platform Capability: Functions across different humanoid robot platforms, potentially accelerating commercial deployment.

💡 How It Helps:
– Robotics Engineers: Fully open-sourced code and implementation guidelines for deployment on standard platforms.
– AI Researchers: Novel approach combining AIGC video diffusion with robot control, addressing Moravec’s paradox.
– Product Developers: Framework supports 100+ dexterous manipulation tasks on single-arm platforms and 50+ tasks on bimanual humanoid robots.
– QA Teams: Interpretable visual representations allow debugging without extensive real-world testing.

🌟 Why It Matters:
VPP represents a significant shift in embodied AI by bringing AIGC’s generative capabilities to physical robotics. Its approach of predicting future states before execution creates more adaptable robots that can generalize across tasks and embodiments. As complementary to VLA models like PI, VPP demonstrates how different AI paradigms can advance robotics, pointing toward a future where embodied AGI becomes increasingly tangible and commercially viable.

Original Chinese article: https://mp.weixin.qq.com/s/lU6a-ay758DgpMzyWhI0FQ

English translation via free online service: https://translate.google.com/translate?hl=en&sl=zh-CN&tl=en&u=https%3A%2F%2Fmp.weixin.qq.com%2Fs%2FlU6a-ay758DgpMzyWhI0FQ

Video Credit: The original article

4. Lenovo Unveils Comprehensive AI Agent Matrix at Tech World 2025

🔑 Key Details:
– Lenovo launches ‘AI Super Agent’ covering personal, enterprise, and city scenarios at Tech World 2025 in Shanghai.
– Three core capabilities: perception & interaction, cognition & decision-making, and autonomous evolution power all agents.
– New inference acceleration engine enables on-device AI with 3x performance increase expected within 12 months.
– Four new AI devices announced featuring ‘Tianxi’ personal AI agent, including world’s first scrollable screen AI PC.

💡 How It Helps:
– Remote workers: ‘Tianxi’ personal AI agent can autonomously manage complex tasks across devices and personal cloud data.
– Enterprise leaders: ‘Lexiang’ enterprise agent functions as a digital employee for marketing, sales, procurement, and service engineering.
– Urban planners: City-level super agents deployed in Wuyishan and Yichang demonstrate effective governance and social service capabilities.
– Sports organizations: FIFA World Cup partnership will enhance player analytics, referee decisions, and fan experiences.

🌟 Why It Matters:
Lenovo’s comprehensive AI strategy marks a paradigm shift from AI as tools to AI as cognitive operating systems. By establishing a full ecosystem spanning personal devices to city infrastructure, Lenovo positions itself at the forefront of AI’s next evolution. The company’s focus on on-device inference and privacy protection addresses critical industry challenges, while their mixed computing infrastructure approach creates a blueprint for enterprise-scale AI adoption that balances performance with practical deployment.

Original Chinese article: https://mp.weixin.qq.com/s/Ny8H9yx7uYolgTOuszhnRA

English translation via free online service: https://translate.google.com/translate?hl=en&sl=zh-CN&tl=en&u=https%3A%2F%2Fmp.weixin.qq.com%2Fs%2FNy8H9yx7uYolgTOuszhnRA

Video Credit: The original article

That’s all for today’s China AI Native Industry Insights. Join us at AI Native Foundation Membership Dashboard for the latest insights on AI Native, or follow our linkedin account at AI Native Foundation and our twitter account at AINativeF.

China AI Native Industry Insights – 20250508 – StepFun Technology | Tencent | Robot Era | Tsinghua University | more

1. ACE-Step: Jumpstep and ACE Studio Open Source 3.5B Music AI Model

2. Tencent’s Yuanbao App Adds DeepSeek Text-to-Image Generation

3. Tsinghua and Robot Era Open-Source VPP: The ‘Sora’ of Robotics

4. Lenovo Unveils Comprehensive AI Agent Matrix at Tech World 2025

About

Insights

Case Study

Legal