ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds 2024-09-17 Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models 2024-09-17 One missing piece in Vision and Language: A Survey on Comics Understanding 2024-09-17 jina-embeddings-v3: Multilingual Embeddings With Task LoRA 2024-09-17 On the Diagram of Thought 2024-09-17 Policy Filtration in RLHF to Fine-Tune LLM for Code Generation 2024-09-17 AudioBERT: Audio Knowledge Augmented Language Model 2024-09-17 Towards Predicting Temporal Changes in a Patient’s Chest X-ray Images based on Electronic Health Records 2024-09-17 LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study 2024-09-17 Breaking reCAPTCHAv2 2024-09-17 beeFormer: Bridging the Gap Between Semantic and Interaction Similarity in Recommender Systems 2024-09-17 InstantDrag: Improving Interactivity in Drag-based Image Editing 2024-09-16 Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos 2024-09-16 A Diffusion Approach to Radiance Field Relighting using Multi-Illumination Synthesis 2024-09-16 DrawingSpinUp: 3D Animation from Single Character Drawings 2024-09-16 Apollo: Band-sequence Modeling for High-Quality Audio Restoration 2024-09-16 Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection 2024-09-16 Click2Mask: Local Editing with Dynamic Mask Generation 2024-09-16 Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale 2024-09-13 DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? 2024-09-13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49