SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models 2024-11-08 GazeGen: Gaze-Driven User Interaction for Visual Content Generation 2024-11-08 SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation 2024-11-08 DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation 2024-11-08 M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models 2024-11-08 Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models 2024-11-08 Analyzing The Language of Visual Tokens 2024-11-08 M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding 2024-11-08 Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination 2024-11-07 Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level 2024-11-07 Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models 2024-11-07 Self-Consistency Preference Optimization 2024-11-07 From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond 2024-11-07 HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems 2024-11-06 LLaMo: Large Language Model-based Molecular Graph Assistant 2024-11-06 DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution 2024-11-06 Controlling Language and Diffusion Models by Transporting Activations 2024-11-06 Sample-Efficient Alignment for LLMs 2024-11-06 DreamPolish: Domain Score Distillation With Progressive Geometry Generation 2024-11-06 Adaptive Length Image Tokenization via Recurrent Allocation 2024-11-06 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49