AI Native Foundation

Explore ByteDance’s new AI model INFP that brings static portrait photos to life, Alibaba and Wuhan University’s groundbreaking self-supervised makeup transfer model at NeurIPS 2024, and Kunlun Tech’s launch of the Skywork O1 and 4o with free-to-use apps and website. Discover more in Today’s China AI Native Industry Insights.

1. ByteDance launches new AI model INFP to let static portrait photos “speak”

🔑 Key Details:
– Revolutionary INFP AI model lets static portraits ‘speak’ through audio input, seamlessly switching dialogue roles.
– Features ‘Head Mimicry’ to animate photos by extracting facial movements from videos.
– Incorporates ‘Audio Guidance’ for precise lip-syncing and expressive animations.
– Supports a new dialogue dataset, DyConv, featuring over 200 hours of high-quality conversation for enhanced emotional expression.
– Future plans include exploring image and text inputs, aiming for full-body animation.

💡 How It Helps:
– Content Creators: INFP enables creators to breathe life into their photographs, enhancing storytelling and engagement.
– Digital Marketers: The technology offers innovative ways to create personalized and dynamic content, improving audience connection.
– Developers: The research-promoted framework provides a strong technical base for future advancements in AI interactions.

🌟 Why It Matters:
The launch of INFP signals a significant step in AI technology, positioning ByteDance as a leader in the creative and entertainment sectors. As the demand for interactive content grows, INFP offers unique opportunities to engage audiences uniquely and meaningfully. Furthermore, ByteDance’s proactive measures in managing ethical concerns related to misinformation exemplify a responsible approach in AI development.

Original Chinese article: https://mp.weixin.qq.com/s?__biz=Mzk1NzU0MjIwOQ==&mid=2247483785&idx=3&sn=848a7c577decc73ec5d4e97f7b037fbf&chksm=c27d8f54f9aa68653731fd9cbb05dd47a15e83c74d984552a255215069d112a337effcbb86c9#rd

English translation via free online service: https://translate.google.com/translate?hl=en&sl=zh-CN&tl=en&u=https%3A%2F%2Fmp.weixin.qq.com%2Fs%3F__biz%3DMzk1NzU0MjIwOQ%3D%3D%26mid%3D2247483785%26idx%3D3%26sn%3D848a7c577decc73ec5d4e97f7b037fbf%26chksm%3Dc27d8f54f9aa68653731fd9cbb05dd47a15e83c74d984552a255215069d112a337effcbb86c9%23rd

Video Credit: the original article

2. SHMT: Alibaba & Wuhan University Unveil Self-Supervised Makeup Transfer Model at NeurIPS 2024

🔑 Key Details:
– SHMT Method: A novel self-supervised hierarchical makeup transfer method developed by Alibaba and Wuhan University to apply diverse makeup styles accurately.
– Overcoming Challenges: Addresses the lack of paired data and variability in makeup styles, enhancing the realism of makeup application.
– Innovative Techniques: Utilizes a ‘decouple and reconstruct’ strategy to eliminate the reliance on misleading pseudo-paired data and employs Laplacian pyramid decomposition for texture control.

💡 How It Helps:
– AI Researchers: Offers a new framework for studying makeup transfer, enriching model training strategies without needing paired datasets.
– Developers: Provides open-source access to SHMT for further innovation in makeup and image processing technologies.
– Creative Professionals: Enables more realistic and customizable makeup applications in digital platforms, enhancing user engagement.

🌟 Why It Matters:
This innovative approach situates SHMT at the forefront of makeup transfer research, presenting significant advancements in handling diverse makeup styles and improving realism. By leveraging self-supervised techniques, SHMT not only creates high-fidelity makeup applications but also empowers developers and artists to experiment with new aesthetics, potentially transforming digital beauty applications and furthering the creative industry.

Original Chinese article: https://mp.weixin.qq.com/s?__biz=MzU2OTg5NTU2Ng==&mid=2247489721&idx=1&sn=2be8e8a5d80508b4133580ed51703f1a&chksm=fd676e4d82867ad5583cb5eca415c4c25d3623b9c6abf1b7c9f9abf31061a1d4fbdc92c901a3#rd

English translation via free online service: https://translate.google.com/translate?hl=en&sl=zh-CN&tl=en&u=https%3A%2F%2Fmp.weixin.qq.com%2Fs%3F__biz%3DMzU2OTg5NTU2Ng%3D%3D%26mid%3D2247489721%26idx%3D1%26sn%3D2be8e8a5d80508b4133580ed51703f1a%26chksm%3Dfd676e4d82867ad5583cb5eca415c4c25d3623b9c6abf1b7c9f9abf31061a1d4fbdc92c901a3%23rd

Video Credit: the original article

3. Kunlun Tech’s Skywork O1 and 4o are officially launched – Free Access via Tiangong App and Web!

🔑 Key Details:
– Launch Announcement: Kunlun Tech officially released ‘Skywork o1’ and ‘Skywork 4o’ models on January 6, 2025, accessible for free on the Tiangong app and website.
– Innovative Features: ‘Skywork o1’ excels in logical reasoning across various domains like mathematics and ethics, while ‘Skywork 4o’ offers a multi-modal voice assistant, Skyo, capable of real-time empathetic dialogue.
– Technical Advancements: The models utilize advanced training methodologies, including self-distillation and Q* reasoning algorithms, improving efficiency and reasoning capabilities.
– Performance Benchmarks: ‘Skywork o1’ surpassed previous models in math and code evaluations, proving its superior capabilities.

💡 How It Helps:
– AI Developers: Open-source access promotes innovation and integration into various applications.
– Content Creators: Enhanced reasoning and logic capabilities assist in generating high-quality content and applications.
– Customer Support Teams: Skyo’s natural and quick responses improve user interactions, providing a better customer experience.

🌟 Why It Matters:
This launch positions Kunlun Tech strategically within the AI landscape, showcasing its commitment to advancing AI capabilities and user-centric innovations. By providing free access to powerful models, they foster a broader adoption of AI technologies, solidifying their competitive edge. As the AI field evolves towards more sophisticated solutions, such initiatives could redefine user engagement and AI application in real-world scenarios.

Original Chinese article: https://mp.weixin.qq.com/s?__biz=MzI1MzE1NDc3Mg==&mid=2247504007&idx=1&sn=c04be656b0d56692975272c07bd794a7&chksm=e8811a07f5c623aebd35589cddb548fb5bcdb358e0cbb7ddc8e763e1a00d5fcf9b96161f05e7#rd

English translation via free online service: https://translate.google.com/translate?hl=en&sl=zh-CN&tl=en&u=https%3A%2F%2Fmp.weixin.qq.com%2Fs%3F__biz%3DMzI1MzE1NDc3Mg%3D%3D%26mid%3D2247504007%26idx%3D1%26sn%3Dc04be656b0d56692975272c07bd794a7%26chksm%3De8811a07f5c623aebd35589cddb548fb5bcdb358e0cbb7ddc8e763e1a00d5fcf9b96161f05e7%23rd

Video Credit: the original article

4. Bytedance launches Infinity to improve the efficiency and quality of text-to-image synthesis

🔑 Key Details:
– New Framework: ByteDance’s ‘Infinity’ framework enhances text-to-image synthesis efficiency and quality, overcoming challenges faced by traditional methods.
– Bitwise Token Innovation: The model introduces a Bitwise Token autoregressive framework for better high-frequency signal capture and detail-rich images.
– Infinite Vocabulary: Infinity significantly expands the vocabulary for image generation, improving accuracy in complex text instructions.
– Performance Edge: In user testing, Infinity outperformed existing models, proving its superior quality and speed in image generation tasks.

💡 How It Helps:
– Developers: Infinity’s innovative framework allows for more accurate and efficient image generation, streamlining development processes.
– Creators: The ability to quickly generate high-quality images enhances creative workflows in advertising, gaming, and film production.

🌟 Why It Matters:
The launch of Infinity positions ByteDance at the forefront of the AI landscape, offering technological innovations that significantly enhance creative industries. By setting new performance benchmarks, Infinity paves the way for broader applications and creates a competitive advantage in text-to-image synthesis, crucial for businesses aiming to leverage AI in their creative processes.

Original Chinese article: https://mp.weixin.qq.com/s?__biz=MzkxODYzMTU1Mg==&mid=2247490900&idx=1&sn=7637283a5c1d52e326cbb9bb8366a4cb&chksm=c016e8b869fa0b9f3a7208afdff53bea5f69948a7775ffb5ace68b19e0dc7aac4aac733786b2#rd

English translation via free online service: https://translate.google.com/translate?hl=en&sl=zh-CN&tl=en&u=https%3A%2F%2Fmp.weixin.qq.com%2Fs%3F__biz%3DMzkxODYzMTU1Mg%3D%3D%26mid%3D2247490900%26idx%3D1%26sn%3D7637283a5c1d52e326cbb9bb8366a4cb%26chksm%3Dc016e8b869fa0b9f3a7208afdff53bea5f69948a7775ffb5ace68b19e0dc7aac4aac733786b2%23rd

Video Credit: the original article

5. ByteDance Launches Open-Source SOTA Lip Sync Model LatentSync

🔑 Key Details:
– ByteDance has released the state-of-the-art (SOTA) lip sync model, LatentSync, designed to synchronize lip movement with audio conditions.
– The model architecture includes training and inference components utilizing a variational autoencoder (VAE) and self-attention mechanisms for efficient encoding and loss training.
– Evaluation employs metrics like TREPA, LPIPS, and mel-spectrogram loss, indicating its effectiveness.
– This model has the potential to significantly enhance automation in news reporting, possibly transforming the roles of news anchors and reporters.

💡 How It Helps:
– AI Developers: Open-source code offers opportunities for further innovation in lip sync technology and applications.
– Content Creators: The model enables enhanced video content generation, providing tools for more engaging storytelling.
– Media Organizations: Automated lip-sync capabilities could streamline video production, reducing time and costs.

🌟 Why It Matters:
The launch of LatentSync positions ByteDance as a leader in AI-driven video technologies, providing competitive advantages in the media landscape. By automating synchronization processes, the model not only enhances video production efficiency but also pushes the envelope for creative expression. This move highlights the ongoing trend of integrating advanced AI into media workflows, raising standards for content quality and viewer engagement.

Original Chinese article: https://mp.weixin.qq.com/s?__biz=MzA4NzgzMjA4MQ==&mid=2453460136&idx=2&sn=833f47fa1ed577cbb9f13ed06fd0a521&chksm=8663de9319ebec004dcb1e30b14e9f4044945b991a2d68946fe273c1cf90529a5f4e824cc210#rd

English translation via free online service: https://translate.google.com/translate?hl=en&sl=zh-CN&tl=en&u=https%3A%2F%2Fmp.weixin.qq.com%2Fs%3F__biz%3DMzA4NzgzMjA4MQ%3D%3D%26mid%3D2453460136%26idx%3D2%26sn%3D833f47fa1ed577cbb9f13ed06fd0a521%26chksm%3D8663de9319ebec004dcb1e30b14e9f4044945b991a2d68946fe273c1cf90529a5f4e824cc210%23rd

Video Credit: the original article

That’s all for today’s China AI Native Industry Insights. Join us at AI Native Foundation Membership Dashboard for the latest insights on AI Native, or follow our linkedin account at AI Native Foundation and our twitter account at AINativeF.

China AI Native Industry Insights – 20250106 – ByteDance | Alibaba | KUNLUN TECH | more

1. ByteDance launches new AI model INFP to let static portrait photos “speak”

2. SHMT: Alibaba & Wuhan University Unveil Self-Supervised Makeup Transfer Model at NeurIPS 2024

3. Kunlun Tech’s Skywork O1 and 4o are officially launched – Free Access via Tiangong App and Web!

4. Bytedance launches Infinity to improve the efficiency and quality of text-to-image synthesis

5. ByteDance Launches Open-Source SOTA Lip Sync Model LatentSync

Don’t miss these tips!

About

Insights

Case Study

Legal