China AI Native Industry Insights – 20250514 – Monica | ByteDance | Tencent | more

Explore Manus’s expanded access with free daily tasks and 1,000 bonus credits, ByteDance’s advanced Seed1.5-VL multimodal AI model, and Tencent’s first unified multimodal chain-of-thought reward model. Discover more in Today’s China AI Native Industry Insights.
1. Manus Expands Access: Free Daily Tasks, 1,000 Bonus Credits, and No Waitlist
🔑 Key Details:
– Manus is now open to all users—no waitlist required.
– Enjoy one free daily task (300 credits) with a bonus 1,000 credits for all users.
– Paid plans starting at $19/month offer additional features, priority access, and more usage.
💡 How It Helps:
– New users can try Manus instantly without barriers.
– Free daily tasks support light, consistent usage.
– Flexible plans cater to creators, developers, and teams needing higher capacity.
🌟 Why It Matters:
By removing the waitlist and adding daily free tasks, Manus lowers the entry barrier for users to explore its AI capabilities. With generous credit bonuses and scalable subscription options, it’s now easier than ever to experience and adopt Manus across various use cases.
Original article: https://x.com/ManusAI_HQ/status/1921943525261742203
Video Credit: ManusAI (@ManusAI_HQ on X)
2. ByteDance Unveils Seed1.5-VL: An Advanced Multimodal AI Model for Images, Videos, GUI, and Games
🔑 Key Details:
– Powerful Performance: Seed1.5-VL achieves state-of-the-art results on 38 of 60 public benchmarks with only 20B active parameters, comparable to Gemini 2.5 Pro.
– Comprehensive Architecture: Combines a 532M parameter SeedViT visual encoder, MLP adapter, and 20B parameter Seed1.5-LLM using MoE architecture.
– Extensive Training: Model trained on over 3T tokens of multimodal data across image, video, OCR, charts, and GUI interactions.
– Advanced Capabilities: Excels at visual reasoning, object localization/counting, video understanding, and GUI interactions for automation.
💡 How It Helps:
– AI Developers: API access via Volcano Engine enables integration of advanced visual understanding into applications at lower computational cost.
– Testing Teams: ByteDance’s testing department already uses the model for automated regression testing, reducing manual work.
– Content Creators: Enhanced video understanding with temporal localization enables precise content analysis and timestamp identification.
– Designers: Superior GUI understanding facilitates automatic interaction with interfaces across PC and mobile environments.
🌟 Why It Matters:
Seed1.5-VL represents a significant advancement in efficient multimodal AI that balances performance with practical deployment considerations. Its ability to handle complex visual reasoning, temporal video understanding, and interactive GUI tasks demonstrates multimodal AI’s evolution toward general-purpose assistants. ByteDance’s focus on reducing computational requirements while maintaining competitive performance against larger models shows the industry’s shift toward more sustainable and accessible AI technologies that can be practically implemented in real-world applications.
Original Chinese article: https://mp.weixin.qq.com/s/uWvOVPEowCXAuowrTKefiA
English translation via free online service: https://translate.google.com/translate?hl=en&sl=zh-CN&tl=en&u=https%3A%2F%2Fmp.weixin.qq.com%2Fs%2FuWvOVPEowCXAuowrTKefiA
Video Credit: The original article
3. Tencent Unveils First Unified Multimodal Chain-of-Thought Reward Model
🔑 Key Details:
– First Unified Reward Model: Tencent and partners developed UnifiedReward-Think, the first multimodal reward model with chain-of-thought reasoning capabilities.
– Three-Stage Training: Employs a novel Cold Start → Rejection Sampling → GRPO framework to progressively build reasoning abilities.
– Performance Boost: Achieves significant improvements across image generation and understanding tasks, even with implicit reasoning.
– Open-Source Release: Code, model, datasets, and training scripts available on GitHub and Hugging Face.
💡 How It Helps:
– AI Researchers: Access to complete implementation enables further exploration of reward modeling with reasoning capabilities.
– Model Developers: Framework for building more interpretable and accurate evaluation systems for multimodal AI.
– Content Evaluators: More reliable assessment of complex visual tasks with detailed dimension-specific reasoning.
🌟 Why It Matters:
This breakthrough transforms reward models from simple scorers into cognitive systems with logical reasoning abilities. By bridging the gap between surface-level evaluation and human-like thinking, UnifiedReward-Think represents a significant advancement in aligning AI with human preferences, especially for complex visual tasks where previous models struggled with accuracy and interpretability.
Original Chinese article: https://mp.weixin.qq.com/s/HAli4g-gYMl0XsegJwR2xA
English translation via free online service: https://translate.google.com/translate?hl=en&sl=zh-CN&tl=en&u=https%3A%2F%2Fmp.weixin.qq.com%2Fs%2FHAli4g-gYMl0XsegJwR2xA
Video Credit: The original article
4. Step1X-3D: Open-Source Framework Enables High-Fidelity 3D Asset Creation
🔑 Key Details:
– Two-Stage Architecture: Combines hybrid VAE-DiT for geometry with SD-XL for textures, producing watertight 3D assets with consistent textures.
– Extensive Data Curation: Processed >5M assets to create a 2M high-quality dataset with standardized properties.
– Full Open-Source Release: Includes models (1.3B geometry, 3.5B texture), training code, 800K curated dataset UIDs, and interactive demo.
– Cross-Domain Bridge: Supports direct transfer of 2D control techniques (e.g., LoRA) to 3D synthesis.
💡 How It Helps:
– 3D Content Creators: Generate high-quality textured 3D assets through text prompts with photorealistic, cartoon, or sketch styling options.
– AI Researchers: Access comprehensive framework with training code and curated datasets for advancing 3D generation research.
– Game Developers: Quickly produce customizable 3D assets with consistent geometry-texture alignment for game environments.
🌟 Why It Matters:
Step1X-3D addresses the three fundamental challenges that have hindered 3D generation progress: data scarcity, algorithmic limitations, and ecosystem fragmentation. By simultaneously advancing data quality standards, computational methods, and accessibility through open-source release, it establishes a new foundation for 3D asset generation that bridges the gap between research and practical applications while democratizing high-quality 3D creation tools.
Original article: https://github.com/stepfun-ai/Step1X-3D
Video Credit: The original article
That’s all for today’s China AI Native Industry Insights. Join us at AI Native Foundation Membership Dashboard for the latest insights on AI Native, or follow our linkedin account at AI Native Foundation and our twitter account at AINativeF.