Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention 2024-10-15 Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations 2024-10-15 VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents 2024-10-15 TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models 2024-10-15 Rethinking Data Selection at Scale: Random Selection is Almost All You Need 2024-10-15 LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory 2024-10-15 Tree of Problems: Improving structured problem solving with compositionality 2024-10-15 Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies 2024-10-15 TVBench: Redesigning Video-Language Evaluation 2024-10-15 The Same But Different: Structural Similarities and Differences in Multilingual Language Modeling 2024-10-15 Thinking LLMs: General Instruction Following with Thought Generation 2024-10-15 MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models 2024-10-15 ReLU’s Revival: On the Entropic Overload in Normalization-Free Large Language Models 2024-10-15 Baichuan-Omni Technical Report 2024-10-14 Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis 2024-10-14 From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning 2024-10-14 EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models 2024-10-14 StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization 2024-10-14 PositionID: LLMs can Control Lengths, Copy and Paste with Explicit Positional Awareness 2024-10-14 SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights 2024-10-14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49