20241028 – ByteDance | Tencent | ChatGLM | MiniMax | more
1. Create digital humans without training—ByteDance’s PersonaTalk surpasses state-of-the-art (SOTA) standards in video lip-sync editing.
In the wake of the AIGC boom, advanced voice-driven video lip-syncing technologies are revolutionizing video content personalization. ByteDance’s innovative PersonaTalk recently earned a spot in the SIGGRAPH Asia 2024 Conference Track, showcasing its capability to edit videos without the constraints of original quality while ensuring efficient, high-quality outcomes. This new approach not only enhances video creation for digital avatars but also opens avenues for applications in areas like video translation and virtual teaching.
Original Chinese article: https://www.jiqizhixin.com/articles/2024-10-26
English translation via free online service: https://translate.google.com/translate?hl=en&sl=zh-CN&tl=en&u=https%3A%2F%2Fwww.jiqizhixin.com%2Farticles%2F2024-10-26
2. Tencent develops the world’s first giant panda model: real-time recognition, statistic, analysis of panda behavior, and report generation.
Tencent has developed the world’s first intelligent behavioral recognition model for giant pandas, in collaboration with the China Panda Protection and Research Center and Guangdong University of Technology. This innovative system can identify various daily behaviors such as feeding and sleeping, providing automated reports and enhancing behavior recognition accuracy in obstructed environments to over 80%. The model aims to assist caregivers in panda management and health monitoring, showcasing the intersection of AI and wildlife conservation.
Original Chinese article: https://www.ithome.com/0/805/488.htm
English translation via free online service: https://translate.google.com/translate?hl=en&sl=zh-CN&tl=en&u=https%3A%2F%2Fwww.ithome.com%2F0%2F805%2F488.htm
3. ChatGLM launches the emotional voice model GLM-4-Voice: capable of understanding emotions, expressing feelings, and resonating with them.
ChatGLM has launched its new emotional voice model, GLM-4-Voice, designed to comprehend and convey emotions effectively. This groundbreaking model offers adjustable speech rates, supports multiple languages and dialects, and allows for flexible user interaction. With features like emotional expression and low latency, users can now experience a more engaging and responsive communication through the “ChatGLM” app.
Original Chinese article: https://www.ithome.com/0/805/213.htm
English translation via free online service: https://translate.google.com/translate?hl=en&sl=zh-CN&tl=en&u=https%3A%2F%2Fwww.ithome.com%2F0%2F805%2F213.htm
4. The large model unicorn MiniMax will launch its first end-to-end real-time voice conversation API product, comparable to GPT-4o, in November.
MiniMax, a unicorn in the AI model sector, is set to launch its first end-to-end real-time voice dialogue API this November, designed to compete directly with OpenAI’s GPT-4o. This innovative service aims to enhance multi-modal processing capabilities, offering lower latency and more natural interactions for various applications including enterprise collaboration and gaming. As the dialogue AI market is projected to reach 10.8 billion USD by 2026, MiniMax is positioning itself at the forefront of this growing industry.
Original Chinese article: https://www.tmtpost.com/7300910.html
English translation via free online service: https://translate.google.com/translate?hl=en&sl=zh-CN&tl=en&u=https%3A%2F%2Fwww.tmtpost.com%2F7300910.html
5. MiracleVision large model image generation capabilities have been upgraded, launching a one-stop AI short video creation tool MOKI.
Meitu has upgraded its MiracleVision model for enhanced image generation and launched MOKI, an all-in-one AI short film creation tool. This new platform supports content input ranging from ultra-short to extended text, allowing for video generation up to 1 minute long at 24 FPS and 1080P resolution, aimed at redefining user experience in digital storytelling.
Original Chinese article: https://www.tmtpost.com/nictation/7300306.html
English translation via free online service: https://translate.google.com/translate?hl=en&sl=zh-CN&tl=en&u=https%3A%2F%2Fwww.tmtpost.com%2Fnictation%2F7300306.html
6. Westlake University “AI Scientist” Nova Emerges: The Key to a New Era of Scientific Research.
Scientists are both astonished and intrigued by the capabilities of a new AI model named Nova, developed by a team from Westlake University in collaboration with other institutions. Nova is designed to rapidly generate groundbreaking scientific ideas, exhibiting a creative capacity 2.5 times greater than its predecessor, the AI Scientist. This innovative tool not only streamlines the research process but also promises to revolutionize the landscape of scientific discovery by significantly enhancing efficiency and productivity.
Original Chinese article: https://mp.weixin.qq.com/s?__biz=MzA5ODEzMjIyMA==&mid=2247718669&idx=1&sn=fe6968ba50cbedf08494a0cf29ad2d9a&chksm=909b9e9ea7ec1788272b2cdfe197a147b61584aa3922ad6377d835af7d2247479ddf239c4c10&scene=21#wechat_redirect
English translation via free online service: https://translate.google.com/translate?hl=en&sl=zh-CN&tl=en&u=https%3A%2F%2Fmp.weixin.qq.com%2Fs%3F__biz%3DMzA5ODEzMjIyMA%3D%3D%26mid%3D2247718669%26idx%3D1%26sn%3Dfe6968ba50cbedf08494a0cf29ad2d9a%26chksm%3D909b9e9ea7ec1788272b2cdfe197a147b61584aa3922ad6377d835af7d2247479ddf239c4c10%26scene%3D21%23wechat_redirect
7. The voice large model “MaskGCT” is officially open-sourced, providing services for short dramas, games, digital humans, and other products.
The innovative voice model “MaskGCT” has been officially open-sourced, offering services for short dramas, games, and digital avatars. This state-of-the-art model stands out with its rapid sound cloning, multilingual capabilities, and enhanced speech synthesis quality, allowing for natural-sounding audio across various languages. As the demand for international content grows, MaskGCT is positioned to transform the landscape of digital media production.
Original Chinese article: https://sohu.com/a/820095321_114778
English translation via free online service: https://translate.google.com/translate?hl=en&sl=zh-CN&tl=en&u=https%3A%2F%2Fsohu.com%2Fa%2F820095321_114778
8. Zhipu AI launches AutoGLM agent: input commands to simulate human operation of a mobile phone.
The Zhipu AI team has unveiled AutoGLM, an innovative agent based on GLM technology that simulates human smartphone operations for a variety of tasks. This advancement marks a significant leap in AI applications within the “Phone Use” domain, enabling users to perform actions like making purchases or booking hotels seamlessly. With its self-improvement capabilities and enhanced task execution precision, AutoGLM demonstrates substantial performance gains over leading models, revolutionizing daily interactions with technology.
Original Chinese article: https://www.aibase.com/zh/news/12754
English translation via free online service: https://translate.google.com/translate?hl=en&sl=zh-CN&tl=en&u=https%3A%2F%2Fwww.aibase.com%2Fzh%2Fnews%2F12754
9. Does autonomous driving also need to venture into the “metaverse”? GigaAI uses AI to enhance the experience, making 4D scene reconstruction smoother!
GigaAI has unveiled DriveDreamer4D, an innovative framework designed to enhance 4D driving scene reconstruction by leveraging world model knowledge. Unlike traditional methods that struggle with diverse driving scenarios, DriveDreamer4D incorporates AI to predict complex situations, significantly improving the fidelity of generated imagery and vehicle positioning. Currently in the research phase, this breakthrough technology promises to revolutionize automated driving by offering a safer and more reliable testing environment.
Original Chinese article: https://www.aibase.com/zh/news/12776
English translation via free online service: https://translate.google.com/translate?hl=en&sl=zh-CN&tl=en&u=https%3A%2F%2Fwww.aibase.com%2Fzh%2Fnews%2F12776
10. Xiaomi’s Wang Teng: The Xiaomi 15 series starts with 12GB of memory due to increased memory usage for edge AI.
Xiaomi’s Wang Teng announced that future flagship models will likely eliminate the 8GB RAM option, as more memory will be required for AI applications running on-device. He emphasized considering models with larger RAM for optimal performance, in light of the rapid integration of AI capabilities in smartphones.
Original Chinese article: https://www.ithome.com/0/805/812.htm
English translation via free online service: https://translate.google.com/translate?hl=en&sl=zh-CN&tl=en&u=https%3A%2F%2Fwww.ithome.com%2F0%2F805%2F812.htm
That’s all for today’s China AI Native Industry Insights. Join us at AI Native Foundation Membership Dashboard for the latest insights on AI Native, or follow our linkedin account at AI Native Foundation and our twitter account at AINativeF.