StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements 2024-12-12 I Don’t Know: Explicit Modeling of Uncertainty with an [IDK] Token 2024-12-12 MIT-10M: A Large Scale Parallel Corpus of Multilingual Image Translation 2024-12-12 Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel 2024-12-12 STIV: Scalable Text and Image Conditioned Video Generation 2024-12-11 Evaluating and Aligning CodeLLMs on Human Preference 2024-12-11 DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation 2024-12-11 ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer 2024-12-11 Hidden in the Noise: Two-Stage Robust Watermarking for Images 2024-12-11 FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models 2024-12-11 UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics 2024-12-11 Mobile Video Diffusion 2024-12-11 3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation 2024-12-11 OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations 2024-12-11 Granite Guardian 2024-12-11 MoViE: Mobile Diffusion for Video Editing 2024-12-11 Frame Representation Hypothesis: Multi-Token LLM Interpretability and Concept-Guided Text Generation 2024-12-11 Video Motion Transfer with Diffusion Transformers 2024-12-11 Perception Tokens Enhance Visual Reasoning in Multimodal Language Models 2024-12-11 EMOv2: Pushing 5M Vision Model Frontier 2024-12-11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121