SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration 2024-11-21 VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models 2024-11-21 VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation 2024-11-21 SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory 2024-11-21 Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents 2024-11-21 Stylecodes: Encoding Stylistic Information For Image Generation 2024-11-21 When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training 2024-11-21 Loss-to-Loss Prediction: Scaling Laws for All Datasets 2024-11-21 ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models 2024-11-21 ORID: Organ-Regional Information Driven Framework for Radiology Report Generation 2024-11-21 Generating Compositional Scenes via Text-to-image RGBA Instance Generation 2024-11-21 RedPajama: an Open Dataset for Training Large Language Models 2024-11-20 Continuous Speculative Decoding for Autoregressive Image Generation 2024-11-20 ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements 2024-11-20 FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations 2024-11-20 Soft Robotic Dynamic In-Hand Pen Spinning 2024-11-20 Building Trust: Foundations of Security, Safety and Transparency in AI 2024-11-20 SEAGULL: No-reference Image Quality Assessment for Regions of Interest via Vision-Language Instruction Tuning 2024-11-20 Evaluating Tokenizer Performance of Large Language Models Across Official Indian Languages 2024-11-20 Generative World Explorer 2024-11-19 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49