AI Native Daily Paper Digest – 20250715
1. SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation 🔑 Keywords: SpeakerVid-5M, audio-visual, virtual human, large-scale dataset, dyadic interaction […]
AI Native Daily Paper Digest – 20250714
1. Test-Time Scaling with Reflective Generative Model 🔑 Keywords: MetaStone-S1, Self-supervised Process Reward Model, Reflective Generative Model, Test Time Scaling, Scaling Law […]
AI Native Daily Paper Digest – 20250711
1. Scaling RL to Long Videos 🔑 Keywords: Vision-Language Models, Reinforcement Learning, Long Video QA, Multi-modal Reinforcement Sequence Parallelism, LongVideo-Reason 💡 Category: […]
AI Native Daily Paper Digest – 20250710
1. 4KAgent: Agentic Any Image to 4K Super-Resolution 🔑 Keywords: agentic super-resolution, Profiling, Perception Agent, Restoration Agent, low-level vision tasks 💡 Category: […]
AI Native Daily Paper Digest – 20250709
1. SingLoRA: Low Rank Adaptation Using a Single Matrix 🔑 Keywords: SingLoRA, Low-Rank Adaptation, parameter-efficient, fine-tuning, common sense reasoning 💡 Category: Foundations […]
AI Native Daily Paper Digest – 20250707
1. How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks 🔑 Keywords: Multimodal Foundation Models, prompt […]
AI Native Daily Paper Digest – 20250704
1. WebSailor: Navigating Super-human Reasoning for Web Agent 🔑 Keywords: WebSailor, LLM, proprietary agents, reasoning capabilities, complex information-seeking tasks 💡 Category: Reinforcement […]
AI Native Daily Paper Digest – 20250703
1. Kwai Keye-VL Technical Report 🔑 Keywords: Multimodal Large Language Models, short-video understanding, vision-language alignment 💡 Category: Multi-Modal Learning 🌟 Research Objective: […]
AI Native Daily Paper Digest – 20250702
1. GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning 🔑 Keywords: Vision-Language Model, Reinforcement Learning, Multimodal Reasoning, Curriculum Sampling, General-Purpose 💡 […]
AI Native Daily Paper Digest – 20250701
1. Ovis-U1 Technical Report 🔑 Keywords: Ovis-U1, multimodal understanding, text-to-image generation, image editing, diffusion-based visual decoder 💡 Category: Generative Models 🌟 Research […]
AI Native Daily Paper Digest – 20250630
1. BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing 🔑 Keywords: BlenderFusion, diffusion model, source masking, simulated object jittering, AI-generated summary 💡 Category: […]
AI Native Daily Paper Digest – 20250626
1. ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation 🔑 Keywords: ShareGPT-4o-Image, Janus-4o, text-to-image, photorealistic, dataset 💡 Category: Generative Models 🌟 Research […]