IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation 2024-10-10 Aria: An Open Multimodal Native Mixture-of-Experts Model 2024-10-10 Pixtral 12B 2024-10-10 Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation 2024-10-10 Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate 2024-10-10 Pyramidal Flow Matching for Efficient Video Generative Modeling 2024-10-10 Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning 2024-10-10 Falcon Mamba: The First Competitive Attention-free 7B Language Model 2024-10-10 MM-Ego: Towards Building Egocentric Multimodal LLMs 2024-10-10 Story-Adapter: A Training-free Iterative Framework for Long Story Visualization 2024-10-10 Self-Boosting Large Language Models with Synthetic Preference Data 2024-10-10 One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation 2024-10-10 TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation 2024-10-10 Temporal Reasoning Transfer from Text to Video 2024-10-10 CursorCore: Assist Programming through Aligning Anything 2024-10-10 AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs 2024-10-10 ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler 2024-10-10 Diversity-Rewarded CFG Distillation 2024-10-10 F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching 2024-10-10 T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design 2024-10-10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49