E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding 2024-10-03 EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control 2024-10-03 BordIRlines: A Dataset for Evaluating Cross-lingual Retrieval-Augmented Generation 2024-10-03 SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios 2024-10-03 Old Optimizer, New Norm: An Anthology 2024-10-03 InfiniPot: Infinite Context Processing on Memory-Constrained LLMs 2024-10-03 HarmoniCa: Harmonizing Training and Inference for Better Feature Cache in Diffusion Transformer Acceleration 2024-10-03 Selective Aggregation for Low-Rank Adaptation in Federated Learning 2024-10-03 Law of the Weakest Link: Cross Capabilities of Large Language Models 2024-10-02 TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices 2024-10-02 Atlas-Chat: Adapting Large Language Models for Low-Resource Moroccan Arabic Dialect 2024-10-02 One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos 2024-10-02 Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation 2024-10-02 Illustrious: an Open Advanced Illustration Model 2024-10-02 ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer 2024-10-02 SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs 2024-10-02 Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration 2024-10-02 Visual Context Window Extension: A New Perspective for Long Video Understanding 2024-10-02 Helpful DoggyBot: Open-World Object Fetching using Legged Robots and Vision-Language Models 2024-10-02 DressRecon: Freeform 4D Human Reconstruction from Monocular Video 2024-10-02 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49