MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment 2024-10-10 Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis 2024-10-10 VHELM: A Holistic Evaluation of Vision Language Models 2024-10-10 Does Spatial Cognition Emerge in Frontier Models? 2024-10-10 MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering 2024-10-10 VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks 2024-10-10 Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling 2024-10-10 $\textbf{Only-IF}$:Revealing the Decisive Effect of Instruction Diversity on Generalization 2024-10-09 LongGenBench: Long-context Generation Benchmark 2024-10-09 A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation 2024-10-09 RevisEval: Improving LLM-as-a-Judge via Response-Adapted References 2024-10-09 DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search 2024-10-09 Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models 2024-10-09 MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions 2024-10-09 ControlAR: Controllable Image Generation with Autoregressive Models 2024-10-09 Hyper-multi-step: The Truth Behind Difficult Long-context Tasks 2024-10-09 TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention 2024-10-09 EBES: Easy Benchmarking for Event Sequences 2024-10-09 Inference Scaling for Long-Context Retrieval Augmented Generation 2024-10-09 $ε$-VAE: Denoising as Visual Decoding 2024-10-09 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49