Video Instruction Tuning With Synthetic Data 2024-10-04 Loong: Generating Minute-level Long Videos with Autoregressive Language Models 2024-10-04 LLaVA-Critic: Learning to Evaluate Multimodal Models 2024-10-04 Contrastive Localized Language-Image Pre-Training 2024-10-04 Depth Pro: Sharp Monocular Metric Depth in Less Than a Second 2024-10-04 VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment 2024-10-04 Large Language Models as Markov Chains 2024-10-04 Distilling an End-to-End Voice Assistant Without Instruction Training Data 2024-10-04 Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models 2024-10-04 CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling 2024-10-04 Training Language Models on Synthetic Edit Sequences Improves Code Synthesis 2024-10-04 SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration 2024-10-04 MVGS: Multi-view-regulated Gaussian Splatting for Novel View Synthesis 2024-10-04 L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding? 2024-10-04 Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos 2024-10-04 MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation 2024-10-04 Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations 2024-10-04 Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning 2024-10-04 Intelligence at the Edge of Chaos 2024-10-04 Learning the Latent Rules of a Game from Data: A Chess Story 2024-10-04 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49