IDEAW: Robust Neural Audio Watermarking with Invertible Dual-Embedding 2024-10-01 Emu3: Next-Token Prediction is All You Need 2024-09-30 MIO: A Foundation Model on Multimodal Tokens 2024-09-30 VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models 2024-09-30 PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation 2024-09-30 Modulated Intervention Preference Optimization (MIPO): Keep the Easy, Refine the Difficult 2024-09-30 MinerU: An Open-Source Solution for Precise Document Content Extraction 2024-09-30 MSI-Agent: Incorporating Multi-Scale Insight into Embodied Agents for Superior Planning and Decision-Making 2024-09-30 HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows 2024-09-30 A Survey on the Honesty of Large Language Models 2024-09-30 LML: Language Model Learning a Dataset for Data-Augmented Prediction 2024-09-30 MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models 2024-09-27 LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness 2024-09-27 EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions 2024-09-27 Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction 2024-09-27 Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction 2024-09-27 Pixel-Space Post-Training of Latent Diffusion Models 2024-09-27 Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling 2024-09-27 Instruction Following without Instruction Tuning 2024-09-27 Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D Reconstruction 2024-09-27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28