Movie Gen: A Cast of Media Foundation Models 2024-10-18 MobA: A Two-Level Agent System for Efficient Mobile Task Automation 2024-10-18 Harnessing Webpage UIs for Text-Rich Visual Understanding 2024-10-18 Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation 2024-10-18 A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models 2024-10-18 BenTo: Benchmark Task Reduction with In-Context Transferability 2024-10-18 MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models 2024-10-18 PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment 2024-10-18 DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control 2024-10-18 WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines 2024-10-18 MoH: Multi-Head Attention as Mixture-of-Head Attention 2024-10-18 VidPanos: Generative Panoramic Videos from Casual Panning Videos 2024-10-18 JudgeBench: A Benchmark for Evaluating LLM-based Judges 2024-10-18 FlatQuant: Flatness Matters for LLM Quantization 2024-10-18 Retrospective Learning from Interactions 2024-10-18 Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation 2024-10-18 Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens 2024-10-18 $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models 2024-10-18 Can MLLMs Understand the Deep Implication Behind Chinese Images? 2024-10-18 Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant 2024-10-18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49