VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks 2024-10-29 Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction 2024-10-29 Neural Fields in Robotics: A Survey 2024-10-29 Leveraging Locality to Boost Sample Efficiency in Robotic Manipulation 2024-10-29 Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA 2024-10-29 Language Models And A Second Opinion Use Case: The Pocket Professional 2024-10-29 Bi-Level Motion Imitation for Humanoid Robots 2024-10-29 ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting 2024-10-28 Continuous Speech Synthesis using per-token Latent Diffusion 2024-10-28 Teach Multimodal LLMs to Comprehend Electrocardiographic Images 2024-10-28 Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design 2024-10-28 FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality 2024-10-28 Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data 2024-10-28 MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark 2024-10-28 Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance 2024-10-28 Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback 2024-10-28 Analysing the Residual Stream of Language Models Under Knowledge Conflicts 2024-10-28 Counting Ability of Large Language Models and Impact of Tokenization 2024-10-28 Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning 2024-10-28 Mapping the Media Landscape: Predicting Factual Reporting and Political Bias Through Web Interactions 2024-10-28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49