Building and better understanding vision-language models: insights and future directions 2024-08-26 MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? 2024-08-26 LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation 2024-08-26 Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time 2024-08-26 Memory-Efficient LLM Training with Online Subspace Descent 2024-08-26 CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities 2024-08-26 T3M: Text Guided 3D Human Motion Synthesis from Speech 2024-08-26 A Web-Based Solution for Federated Learning with LLM-Based Automation 2024-08-26 HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments 2024-08-26 CODE: Confident Ordinary Differential Editing 2024-08-26 FLoD: Integrating Flexible Level of Detail into 3D Gaussian Splatting for Customizable Rendering 2024-08-26 RoundTable: Leveraging Dynamic Schema and Contextual Autocomplete for Enhanced Query Precision in Tabular Question Answering 2024-08-26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28