Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization 2024-11-22 Multimodal Autoregressive Pre-training of Large Vision Encoders 2024-11-22 Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions 2024-11-22 Hymba: A Hybrid-head Architecture for Small Language Models 2024-11-22 Natural Language Reinforcement Learning 2024-11-22 Ultra-Sparse Memory Network 2024-11-22 OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs 2024-11-22 Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models 2024-11-22 Stable Flow: Vital Layers for Training-Free Image Editing 2024-11-22 Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models 2024-11-22 UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages 2024-11-22 MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control 2024-11-22 Patience Is The Key to Large Language Model Reasoning 2024-11-22 Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation 2024-11-22 DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding 2024-11-22 SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration 2024-11-21 VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models 2024-11-21 VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation 2024-11-21 SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory 2024-11-21 Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents 2024-11-21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49