Grounding Language in Multi-Perspective Referential Communication 2024-10-08 Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach 2024-10-08 SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation 2024-10-08 Addition is All You Need for Energy-efficient Language Models 2024-10-07 NL-Eye: Abductive NLI for Images 2024-10-07 Selective Attention Improves Transformer 2024-10-07 Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise 2024-10-07 Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding 2024-10-07 RoCoTex: A Robust Method for Consistent Texture Synthesis with Diffusion Models 2024-10-07 A Comprehensive Survey of Mamba Architectures for Medical Image Analysis: Classification, Segmentation, Restoration and Beyond 2024-10-07 Erasing Conceptual Knowledge from Language Models 2024-10-07 MIGA: Mixture-of-Experts with Group Aggregation for Stock Market Prediction 2024-10-07 CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction 2024-10-07 NRGBoost: Energy-Based Generative Boosted Trees 2024-10-07 Horizon-Length Prediction: Advancing Fill-in-the-Middle Capabilities for Code Generation with Lookahead Planning 2024-10-07 MLP-KAN: Unifying Deep Representation and Function Learning 2024-10-07 AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark 2024-10-07 CodeMMLU: A Multi-Task Benchmark for Assessing Code Understanding Capabilities of CodeLLMs 2024-10-07 GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs 2024-10-07 Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models 2024-10-04 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49