AI Native Daily Paper Digest – 20250522

1. Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
๐ Keywords: Web-Shepherd, web navigation, process reward model, multimodal large language model
๐ก Category: Reinforcement Learning
๐ Research Objective:
– Introduce Web-Shepherd, a process reward model aimed at improving accuracy and cost-effectiveness in web navigation.
๐ ๏ธ Research Methods:
– Developed WebPRM Collection, a comprehensive dataset featuring 40K step-level preference pairs and annotated checklists.
– Introduced WebRewardBench, the first meta-evaluation benchmark for process reward models.
๐ฌ Research Conclusions:
– Web-Shepherd significantly outperformed existing methods, achieving about 30 points higher accuracy compared to GPT-4o on WebRewardBench.
– Demonstrated 10.9 points better performance with reduced cost on WebArena-lite, using GPT-4o-mini as the policy and Web-Shepherd as the verifier.
๐ Paper link: https://huggingface.co/papers/2505.15277

2. Scaling Law for Quantization-Aware Training
๐ Keywords: Quantization-aware Training (QAT), Large Language Models (LLMs), Quantization Error, Mixed-Precision Quantization, Scaling Law
๐ก Category: Machine Learning
๐ Research Objective:
– The paper proposes a unified scaling law for quantization-aware training (QAT) to model quantization error considering factors like model size, training data volume, and quantization group size, aiming to improve understanding and applicability of QAT at low precisions, particularly 4-bit.
๐ ๏ธ Research Methods:
– Conducted 268 QAT experiments to study the behavior of quantization error, decomposing errors into weight and activation components to identify sensitivities and bottlenecks, specifically focusing on the W4A4 precision level.
๐ฌ Research Conclusions:
– The study concludes that quantization error diminishes with an increase in model size but increases with larger training data and coarser granularity. Mixed-precision quantization can mitigate bottlenecks, reaching similar error levels for weight and activation components. Eventually, more training data causes weight quantization error to surpass activation error, highlighting the need to reduce weight quantization error in extensive datasets.
๐ Paper link: https://huggingface.co/papers/2505.14302

3. MMaDA: Multimodal Large Diffusion Language Models
๐ Keywords: Multimodal diffusion foundation model, Unified architecture, Text-to-image generation, Reinforcement learning, Generalization capabilities
๐ก Category: Multi-Modal Learning
๐ Research Objective:
– To introduce a novel multimodal diffusion foundation model, MMaDA, designed for enhanced performance across textual reasoning, multimodal understanding, and text-to-image generation.
๐ ๏ธ Research Methods:
– Development of a unified diffusion architecture with modality-agnostic design.
– Implementation of a mixed long chain-of-thought fine-tuning strategy.
– Proposal of UniGRPO, a unified policy-gradient-based reinforcement learning algorithm.
๐ฌ Research Conclusions:
– MMaDA demonstrates strong generalization capabilities, surpassing existing models in various tasks, and effectively bridges pretraining and post-training within unified diffusion architectures.
๐ Paper link: https://huggingface.co/papers/2505.15809

4. UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning
๐ Keywords: reinforcement learning, visual grounding, multimodal large language model, reasoning, difficulty bias
๐ก Category: Multi-Modal Learning
๐ Research Objective:
– To address universal visual grounding challenges by enhancing reasoning abilities in multimodal contexts with the model UniVG-R1.
๐ ๏ธ Research Methods:
– Constructing a Chain-of-Thought dataset for supervised fine-tuning.
– Employing rule-based reinforcement learning alongside a difficulty-aware weight adjustment strategy.
๐ฌ Research Conclusions:
– UniVG-R1 outperforms existing models, achieving state-of-the-art results on MIG-Bench with a 9.1% improvement.
– Demonstrates significant generalizability with a 23.4% improvement in zero-shot performance across diverse benchmarks.
๐ Paper link: https://huggingface.co/papers/2505.14231

5. Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
๐ Keywords: diffusion language models, large language model, text retrieval, bidirectional architecture, document retrieval
๐ก Category: Natural Language Processing
๐ Research Objective:
– To explore the performance of diffusion language models for text embeddings, particularly in document retrieval and reasoning-intensive tasks.
๐ ๏ธ Research Methods:
– Conducted systematic comparison of diffusion language models with large language model-based embeddings, emphasizing the bidirectional architecture.
๐ฌ Research Conclusions:
– Diffusion language models surpass LLM embeddings, improving performance by 20% on long-document retrieval and showcasing the importance of bidirectional attention for encoding global context.
๐ Paper link: https://huggingface.co/papers/2505.15045

6. Efficient Agent Training for Computer Use
๐ Keywords: PC Agent-E, trajectory synthesis, data efficiency, human-like computer use
๐ก Category: Reinforcement Learning
๐ Research Objective:
– The primary goal is to improve data efficiency and achieve superior performance in human-like computer use tasks through enhanced trajectory synthesis and training with the PC Agent-E framework.
๐ ๏ธ Research Methods:
– Use of PC Agent-E framework starting with 312 human-annotated trajectories.
– Enhancement of data quality through diverse action decision synthesis using Claude 3.7 Sonnet.
๐ฌ Research Conclusions:
– PC Agent-E achieved a 141% relative improvement on WindowsAgentArena-V2 benchmark.
– Demonstrated strong generalizability to different operating systems on OSWorld, showing potent computer use capabilities can arise from limited high-quality trajectory data.
๐ Paper link: https://huggingface.co/papers/2505.13909

7. This Time is Different: An Observability Perspective on Time Series Foundation Models
๐ Keywords: Time Series Forecasting, Decoder-Only Architecture, Observability Data, State-of-the-Art Performance
๐ก Category: Machine Learning
๐ Research Objective:
– Introduce Toto, a time series forecasting foundation model utilizing a decoder-only architecture, designed to tackle challenges in multivariate observability data.
๐ ๏ธ Research Methods:
– Developed with 151 million parameters and pre-trained using a diverse corpus including observability, open, and synthetic data.
– BOOM benchmark introduced, compiling 350 million observations from 2,807 real-world time series, sourced from Datadogโs telemetry and observability metrics.
๐ฌ Research Conclusions:
– Toto achieves state-of-the-art performance on both the BOOM benchmark and other established time series forecasting benchmarks, with all related resources available as open source.
๐ Paper link: https://huggingface.co/papers/2505.14766

8. Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
๐ Keywords: Large Reasoning Models, Reinforcement Learning, Length-based Reward Shaping, LASER-D, Difficulty-aware
๐ก Category: Reinforcement Learning
๐ Research Objective:
– To enhance reasoning efficiency and performance in Large Reasoning Models (LRMs) through RL-based reward shaping methods like LASER-D that adapt to difficulty and reduce redundancy.
๐ ๏ธ Research Methods:
– Proposing a novel Length-bAsed StEp Reward shaping method (LASER) using a step function as the reward system, and introducing LASER-D which is dynamic and difficulty-aware for better trade-offs between reasoning performance and efficiency.
๐ฌ Research Conclusions:
– LASER-D significantly improves reasoning performance, achieving a +6.1 improvement on AIME2024 while reducing token use by 63%, and produces more concise reasoning patterns with less redundancy.
๐ Paper link: https://huggingface.co/papers/2505.15612

9. Vid2World: Crafting Video Diffusion Models to Interactive World Models
๐ Keywords: Vid2World, Video Diffusion Models, World Models, Causal Action Guidance, Autoregressive Generation
๐ก Category: Generative Models
๐ Research Objective:
– The research aims to repurpose pre-trained video diffusion models into interactive world models to enhance action controllability and scalability in complex environments.
๐ ๏ธ Research Methods:
– Vid2World utilizes causalization of pre-trained video diffusion models and introduces a causal action guidance mechanism for autoregressive generation.
๐ฌ Research Conclusions:
– The method proves effective in transforming video diffusion models into interactive world models, as demonstrated by experiments in robot manipulation and game simulation domains.
๐ Paper link: https://huggingface.co/papers/2505.14357

10. When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning
๐ Keywords: ASRR, Large Reasoning Models, Redundant Reasoning, Efficiency, Safety
๐ก Category: Knowledge Representation and Reasoning
๐ Research Objective:
– The objective of the study is to optimize the reasoning efficiency in Large Reasoning Models (LRMs) by suppressing redundant information processing without impacting performance or safety.
๐ ๏ธ Research Methods:
– The study introduces the Adaptive Self-Recovery Reasoning (ASRR) framework, which regulates the allocation of reasoning effort based on problem difficulty using accuracy-aware length reward regulation.
๐ฌ Research Conclusions:
– ASRR significantly reduces reasoning computational load by up to 32.5% and 25.7% for different model sizes while maintaining minimal accuracy loss, and enhances safety benchmarks with a notable increase in harmless rates, showcasing its potential for efficient, adaptive, and safer reasoning in LRMs.
๐ Paper link: https://huggingface.co/papers/2505.15400

11. lmgame-Bench: How Good are LLMs at Playing Games?
๐ Keywords: lmgame-Bench, large language models, Reinforcement Learning, Gym-style API, data contamination
๐ก Category: Reinforcement Learning
๐ Research Objective:
– To evaluate the capabilities of large language models using video games that require perception, memory, and planning, addressing issues with brittle vision perception, prompt sensitivity, and data contamination.
๐ ๏ธ Research Methods:
– Introduction of lmgame-Bench, featuring platformer, puzzle, and narrative games through a unified Gym-style API, incorporating lightweight perception and memory scaffolds to stabilize prompt variance and eliminate data contamination.
๐ฌ Research Conclusions:
– Demonstrated that lmgame-Bench effectively challenges and differentiates between models, with its unique capability probes and potential for transfer learning to unseen games and external planning tasks.
๐ Paper link: https://huggingface.co/papers/2505.15146

12. Deliberation on Priors: Trustworthy Reasoning of Large Language Models on Knowledge Graphs
๐ Keywords: Deliberation over Priors, Large Language Models, Knowledge Graphs, Trustworthy Reasoning, Progressive Knowledge Distillation
๐ก Category: Knowledge Representation and Reasoning
๐ Research Objective:
– The research aims to enhance the trustworthiness of Large Language Models (LLMs) by integrating structural and constraint priors from knowledge graphs.
๐ ๏ธ Research Methods:
– The study proposes a framework named Deliberation over Priors (DP), using progressive knowledge distillation and reasoning-introspection strategies.
๐ฌ Research Conclusions:
– The Deliberation over Priors framework demonstrates state-of-the-art performance, improving reliability and faithfulness, evidenced by a 13% Hit@1 improvement on the ComplexWebQuestions dataset.
๐ Paper link: https://huggingface.co/papers/2505.15210

13. Constructing a 3D Town from a Single Image
๐ Keywords: 3DTown, 3D Scenes, Generative Models, Spatial Coherence, Texture Fidelity
๐ก Category: Computer Vision
๐ Research Objective:
– To develop a training-free framework called 3DTown for generating realistic 3D scenes from a single top-down image using region-based generation and spatial-aware 3D inpainting techniques.
๐ ๏ธ Research Methods:
– Decompose the input image into overlapping regions and generate each using a pretrained 3D object generator.
– Utilize a masked rectified flow inpainting process for maintaining structural continuity and filling in missing geometry.
๐ฌ Research Conclusions:
– 3DTown achieves high-quality geometry, spatial coherence, and texture fidelity and outperforms state-of-the-art models in generating 3D towns from a single image without requiring 3D supervision or fine-tuning.
๐ Paper link: https://huggingface.co/papers/2505.15765
14. IA-T2I: Internet-Augmented Text-to-Image Generation
๐ Keywords: Internet-Augmented, Text-to-Image, Reference Images, AI-Generated Summary, Generative Models
๐ก Category: Generative Models
๐ Research Objective:
– The paper aims to enhance text-to-image (T2I) generation models when dealing with uncertain knowledge by integrating Internet-Augmented references.
๐ ๏ธ Research Methods:
– An Internet-Augmented framework is introduced with an active retrieval module, hierarchical image selection module, and a self-reflection mechanism to improve T2I model outputs.
๐ฌ Research Conclusions:
– The proposed framework significantly enhances image generation accuracy under ambiguous text prompts, outperforming evaluations done by GPT-4o by approximately 30% in human preference tests.
๐ Paper link: https://huggingface.co/papers/2505.15779

15. How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study
๐ Keywords: Large Reasoning Models, Supervised Fine-Tuning, Safety Improvements, Reasoning Process
๐ก Category: Knowledge Representation and Reasoning
๐ Research Objective:
– To enhance the safety of Large Reasoning Models (LRMs) using Supervised Fine-Tuning (SFT).
๐ ๏ธ Research Methods:
– Explicit addressing of failure patterns during data distillation and evaluating the necessity of reasoning processes.
๐ฌ Research Conclusions:
– Simplifying reasoning processes can enhance safety without complex reasoning chains, and using math reasoning data during fine-tuning balances safety and over-refusal.
๐ Paper link: https://huggingface.co/papers/2505.15404

16. Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!
๐ Keywords: open-source LLMs, fine-tuning, backdoor training, black-box access, data breach
๐ก Category: Natural Language Processing
๐ Research Objective:
– To identify and address the risk of private downstream fine-tuning data extraction from open-source LLMs through backdoor training.
๐ ๏ธ Research Methods:
– Conducted comprehensive experiments on 4 open-source models with varying parameters and datasets to evaluate extraction performance.
๐ฌ Research Conclusions:
– It is possible to extract a high percentage of fine-tuning data (up to 94.9%) from downstream models using backdoor training, posing a significant data breach risk.
– Detection-based defense strategies can be circumvented with more advanced attacks, highlighting the need for further research to mitigate this risk.
๐ Paper link: https://huggingface.co/papers/2505.15656

17. dKV-Cache: The Cache for Diffusion Language Models
๐ Keywords: Diffusion Language Models, delayed KV-Cache, non-autoregressive architecture, bidirectional attention, speedup
๐ก Category: Natural Language Processing
๐ Research Objective:
– To accelerate the inference process of Diffusion Language Models (DLMs) without significant performance loss using a KV-cache-like mechanism called delayed KV-Cache.
๐ ๏ธ Research Methods:
– Introduced a delayed caching strategy for DLMs with two variants: dKV-Cache-Decode and dKV-Cache-Greedy, allowing for improved performance and speed-up in the denoising process.
๐ฌ Research Conclusions:
– Delayed KV-Cache achieves significant speed-ups of 2-10x, effectively bridging the performance gap between autoregressive models and DLMs, and enables training-free application across various benchmarks.
๐ Paper link: https://huggingface.co/papers/2505.15781

18. Learning to Reason via Mixture-of-Thought for Logical Reasoning
๐ Keywords: Mixture-of-Thought, logical reasoning, natural language, code, symbolic logic
๐ก Category: Multi-Modal Learning
๐ Research Objective:
– The primary objective is to enhance logical reasoning capabilities in Language Models by enabling reasoning across multiple modalities: natural language, code, and symbolic logic.
๐ ๏ธ Research Methods:
– The study introduces a Mixture-of-Thought (MoT) framework with a two-phase design, incorporating self-evolving training and inference using natural language, code, and a truth-table symbolic modality to improve reasoning accuracy.
๐ฌ Research Conclusions:
– The MoT framework significantly advances logical reasoning performance, outperforming single-modality approaches by up to 11.7 percentage points in accuracy and is particularly effective for complex reasoning tasks.
๐ Paper link: https://huggingface.co/papers/2505.15817

19. BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs
๐ Keywords: BARREL, Large Reasoning Models, overconfidence, factual reasoning, DeepSeek
๐ก Category: Knowledge Representation and Reasoning
๐ Research Objective:
– To address overconfidence in Large Reasoning Models by promoting concise and factual reasoning through the novel framework, BARREL.
๐ ๏ธ Research Methods:
– The proposal and implementation of the BARREL framework, focusing on concise and boundary-aware reasoning to reduce errors in reasoning patterns like last-minute guessing and second-thought spiraling.
๐ฌ Research Conclusions:
– BARREL-training significantly improves the reliability of the model DeepSeek-R1-Distill-Llama-8B from 39.33% to 61.48%, maintaining accuracy comparable to models trained on reasoning data from R1, highlighting potential for more factual System 2 LRMs.
๐ Paper link: https://huggingface.co/papers/2505.13529

20. RLVR-World: Training World Models with Reinforcement Learning
๐ Keywords: RLVR-World, Reinforcement Learning, Verifiable Rewards, World Models
๐ก Category: Reinforcement Learning
๐ Research Objective:
– To optimize world models using reinforcement learning with verifiable rewards for task-specific metrics in language and video domains.
๐ ๏ธ Research Methods:
– Introduction of RLVR-World, a framework leveraging reinforcement learning to align world model objectives with task-specific goals using verifiable rewards.
๐ฌ Research Conclusions:
– RLVR-World demonstrates substantial performance improvements in models across different domains, including text games, web navigation, and robot manipulation, offering a promising post-training paradigm for generative models.
๐ Paper link: https://huggingface.co/papers/2505.13934

21. ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning
๐ Keywords: Conversational Query Reformulation, Reinforcement Learning, Self-Distillation, Retrieval Signals
๐ก Category: Natural Language Processing
๐ Research Objective:
– To enhance conversational query reformulation by eliminating dependency on external supervision and improving alignment with retrievers.
๐ ๏ธ Research Methods:
– Utilized reinforcement learning along with a novel self-driven policy warm-up and retrieval-guided self-distillation to optimize query reformulation.
๐ฌ Research Conclusions:
– ConvSearch-R1 significantly outperforms state-of-the-art methods, with over a 10% performance improvement on the challenging TopiOCQA dataset, using a smaller model and without external supervision.
๐ Paper link: https://huggingface.co/papers/2505.15776

22. Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space
๐ Keywords: Soft Thinking, token embeddings, continuous concept space, Chain-of-Thought, pass@1 accuracy
๐ก Category: Knowledge Representation and Reasoning
๐ Research Objective:
– Introduce Soft Thinking, a training-free method, enhancing human-like reasoning with soft, abstract concept tokens.
๐ ๏ธ Research Methods:
– Employs probability-weighted mixtures of token embeddings to form a continuous concept space, enabling richer representations.
๐ฌ Research Conclusions:
– Improves pass@1 accuracy by up to 2.48 points and reduces token usage by up to 22.4% compared to standard Chain-of-Thought methods.
๐ Paper link: https://huggingface.co/papers/2505.15778

23. Text Generation Beyond Discrete Token Sampling
๐ Keywords: Mixture of Inputs (MoI), autoregressive generation, Bayesian estimation, text quality, reasoning capabilities
๐ก Category: Natural Language Processing
๐ Research Objective:
– To enhance autoregressive generation by maintaining a richer internal representation using a training-free method.
๐ ๏ธ Research Methods:
– The proposed Mixture of Inputs (MoI) method combines generated discrete tokens with previously discarded token distribution using Bayesian estimation.
๐ฌ Research Conclusions:
– MoI improves text quality and reasoning capabilities, showing consistent performance enhancements in mathematical reasoning, code generation, and PhD-level QA tasks without additional training or significant computational overhead.
๐ Paper link: https://huggingface.co/papers/2505.14827

24. AutoMat: Enabling Automated Crystal Structure Reconstruction from Microscopy via Agentic Tool Use
๐ Keywords: AI Native, STEM images, AutoMat, atomistic simulation, physical properties
๐ก Category: AI Systems and Tools
๐ Research Objective:
– To introduce AutoMat, an agent-assisted pipeline that automatically transforms scanning transmission electron microscopy (STEM) images into atomic crystal structures and predicts physical properties.
๐ ๏ธ Research Methods:
– Utilizes pattern-adaptive denoising, physics-guided template retrieval, symmetry-aware atomic reconstruction, fast relaxation, and property prediction via MatterSim.
๐ฌ Research Conclusions:
– AutoMat significantly outperforms existing multimodal large language models and tools in large-scale experiments with over 450 structure samples, demonstrating the potential to bridge microscopy and atomistic simulation in materials science.
๐ Paper link: https://huggingface.co/papers/2505.12650

25. Evaluate Bias without Manual Test Sets: A Concept Representation Perspective for LLMs
๐ Keywords: Bias in Large Language Models, BiasLens, Concept Activation Vectors, Sparse Autoencoders, AI Ethics and Fairness
๐ก Category: AI Ethics and Fairness
๐ Research Objective:
– The paper aims to analyze bias in large language models (LLMs) without relying on labeled data using a new framework, BiasLens.
๐ ๏ธ Research Methods:
– BiasLens leverages Concept Activation Vectors and Sparse Autoencoders to extract interpretable concept representations and measures variation in representational similarity to quantify bias.
๐ฌ Research Conclusions:
– BiasLens effectively identifies previously undetected forms of bias with high agreement to traditional metrics, offering a scalable, interpretable, and efficient method for bias discovery in LLMs.
๐ Paper link: https://huggingface.co/papers/2505.15524

26. VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models
๐ Keywords: VerifyBench, VerifyBench-Hard, Reinforcement Learning, Reference-based Reward Systems
๐ก Category: Reinforcement Learning
๐ Research Objective:
– Introduce two new benchmarks, VerifyBench and VerifyBench-Hard, to evaluate the accuracy of reference-based reward systems in reinforcement learning for reasoning tasks.
๐ ๏ธ Research Methods:
– Meticulous data collection and curation followed by careful human annotation for benchmark construction.
๐ฌ Research Conclusions:
– Current reasoning models show considerable room for improvement on these benchmarks, especially smaller-scale models, offering insights for developing verifier accuracy and reasoning capabilities.
๐ Paper link: https://huggingface.co/papers/2505.15801

27. RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning
๐ Keywords: Reinforcement Learning, Large Language Models, Generative, Verifier, Process-Level
๐ก Category: Reinforcement Learning
๐ Research Objective:
– To develop a novel RL framework, Tango, which simultaneously trains a generative LLM and an RL-trained verifier for enhanced robustness and generalization.
๐ ๏ธ Research Methods:
– Implementation of Tango framework where both LLM generator and verifier are trained in an interleaved RL manner, with innovations in a process-level generative verifier.
๐ฌ Research Conclusions:
– Tango achieves state-of-the-art performance on math benchmarks and out-of-domain reasoning tasks, demonstrating superior generalization and robustness compared to traditional methods.
๐ Paper link: https://huggingface.co/papers/2505.15034

28. PiFlow: Principle-aware Scientific Discovery with Multi-Agent Collaboration
๐ Keywords: Information-theoretical framework, Automated scientific discovery, Uncertainty reduction, Large Language Model, Plug-and-Play method
๐ก Category: AI Systems and Tools
๐ Research Objective:
– To improve automated scientific discovery by reducing uncertainty and enhancing solution quality using PiFlow, an information-theoretical framework.
๐ ๏ธ Research Methods:
– Developed PiFlow to treat automated scientific discovery as a structured uncertainty reduction problem across three scientific domains, focusing on nanomaterial structures, bio-molecules, and superconductors with specific properties.
๐ฌ Research Conclusions:
– PiFlow significantly enhances discovery efficiency, improving AUC by 73.55% and solution quality by 94.06% compared to traditional systems, establishing a new paradigm for AI-driven research.
๐ Paper link: https://huggingface.co/papers/2505.15047

29. Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models
๐ Keywords: Large Audio Language Models, jailbreak vulnerabilities, adversarial audio prompts, semantic consistency, AI Ethics
๐ก Category: AI Ethics and Fairness
๐ Research Objective:
– The research aims to systematically evaluate the jailbreak vulnerabilities of Large Audio Language Models (LAMs) using a benchmark named AJailBench.
๐ ๏ธ Research Methods:
– Introduced AJailBench-Base, a dataset with 1,495 adversarial audio prompts converted from textual jailbreak attacks.
– Developed an Audio Perturbation Toolkit (APT) that applies targeted distortions across time, frequency, and amplitude domains, enforcing semantic consistency with Bayesian optimization.
๐ฌ Research Conclusions:
– Reveals that existing LAMs do not consistently resist jailbreak attacks.
– Demonstrates that small, semantically preserved perturbations can significantly compromise LAMs’ safety, emphasizing the necessity for more robust, semantically aware defense mechanisms.
๐ Paper link: https://huggingface.co/papers/2505.15406

30. VARD: Efficient and Dense Fine-Tuning for Diffusion Models with Value-based RL
๐ Keywords: Reinforcement Learning, Diffusion Models, Value Function, Training Efficiency, Non-Differentiable Rewards
๐ก Category: Generative Models
๐ Research Objective:
– Introduce VARD, a value function-based reinforcement learning approach to improve diffusion models with dense supervision and efficient handling of non-differentiable rewards.
๐ ๏ธ Research Methods:
– Utilize a value function for predicting expected rewards from intermediate states combined with KL regularization to provide dense supervision throughout the generation process.
๐ฌ Research Conclusions:
– VARD enhances trajectory guidance and training efficiency, extending the use of reinforcement learning to complex diffusion models while maintaining stability and proximity to pre-trained models.
๐ Paper link: https://huggingface.co/papers/2505.15791

31. Prior Prompt Engineering for Reinforcement Fine-Tuning
๐ Keywords: Prior Prompt Engineering, Reinforcement Fine-Tuning, Language Models, Reward Signals
๐ก Category: Reinforcement Learning
๐ Research Objective:
– The study aims to explore how prior prompt engineering (pPE) can guide language models to internalize distinct behaviors through reinforcement fine-tuning (RFT).
๐ ๏ธ Research Methods:
– The paper investigates five representative inference-time prompt engineering strategies, translates them into pPE approaches, and tests these on the Qwen2.5-7B language model across various benchmarks.
๐ฌ Research Conclusions:
– All pPE-trained models outperform their inference-time prompt engineering counterparts, with the null-example pPE approach showing the most significant performance gains. Additionally, different pPE strategies result in distinct behavioral styles, highlighting pPE as a powerful and underexplored aspect of RFT.
๐ Paper link: https://huggingface.co/papers/2505.14157

32. WebNovelBench: Placing LLM Novelists on the Web Novel Distribution
๐ Keywords: WebNovelBench, LLM, narrative quality dimensions, LLM-as-Judge, narrative generation
๐ก Category: Generative Models
๐ Research Objective:
– Introduce WebNovelBench, a benchmark to evaluate long-form storytelling abilities of Large Language Models using Chinese web novels.
๐ ๏ธ Research Methods:
– Employ a large-scale dataset with over 4,000 Chinese web novels; use an LLM-as-Judge framework to assess eight narrative quality dimensions through Principal Component Analysis.
๐ฌ Research Conclusions:
– WebNovelBench differentiates effectively between human-written and LLM-generated narratives, providing insights for LLM storytelling improvement.
๐ Paper link: https://huggingface.co/papers/2505.14818

33. Streamline Without Sacrifice – Squeeze out Computation Redundancy in LMM
๐ Keywords: ProxyV, vision tokens, multimodal models, computation-level redundancy
๐ก Category: Multi-Modal Learning
๐ Research Objective:
– The objective is to enhance efficiency in large multimodal models by reducing computational redundancy on vision tokens using ProxyV.
๐ ๏ธ Research Methods:
– A series of experiments were conducted to identify vision-related computation redundancy and to develop ProxyV for efficient processing.
๐ฌ Research Conclusions:
– ProxyV successfully alleviates computational burdens while maintaining, or even enhancing, model performance. It also integrates flexibly with token reduction methods for further efficiency gains.
๐ Paper link: https://huggingface.co/papers/2505.15816

34. DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling
๐ Keywords: Diffusion ConvNet (DiCo), channel attention mechanism, Convolution, FID, ImageNet
๐ก Category: Generative Models
๐ Research Objective:
– The study aims to enhance efficiency in visual generation tasks by introducing Diffusion ConvNet (DiCo) with a compact channel attention mechanism, as a more efficient alternative to Diffusion Transformer (DiT).
๐ ๏ธ Research Methods:
– Utilizes standard ConvNet modules enhanced with a compact channel attention mechanism to address channel redundancy issues, leading to efficient and expressive diffusion models.
๐ฌ Research Conclusions:
– Diffusion ConvNet (DiCo) demonstrates superior image quality and generation speed on class-conditional ImageNet benchmarks, outperforming previous diffusion models significantly, as evidenced by improved FID scores and speedup rates.
๐ Paper link: https://huggingface.co/papers/2505.11196

35. HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation
๐ Keywords: Human Centered AI, Ethics, Empathy, Inclusivity, HumaniBench
๐ก Category: AI Ethics and Fairness
๐ Research Objective:
– Introduce HumaniBench, a comprehensive benchmark evaluating large multimodal models (LMMs) based on seven human-centered AI principles.
๐ ๏ธ Research Methods:
– Utilizes 32K real-world image-question pairs annotated by a scalable GPT-based pipeline and verified by domain experts.
๐ฌ Research Conclusions:
– Proprietary models generally outperform open-source counterparts, though both struggle with robustness and visual grounding, highlighting existing alignment gaps with human-aligned principles.
๐ Paper link: https://huggingface.co/papers/2505.11454

36. MultiHal: Multilingual Dataset for Knowledge-Graph Grounded Evaluation of LLM Hallucinations
๐ Keywords: Large Language Models, hallucinations, Knowledge Graphs, multilingual, MultiHal
๐ก Category: Natural Language Processing
๐ Research Objective:
– To evaluate and mitigate hallucinations in Large Language Models by developing a multilingual, multihop benchmark using Knowledge Graphs.
๐ ๏ธ Research Methods:
– Developed a new benchmark called MultiHal by mining and curating high-quality KG-paths from open-domain Knowledge Graphs for generative text evaluation across multiple languages.
๐ฌ Research Conclusions:
– MultiHal shows a significant improvement in semantic similarity scores when integrating Knowledge Graphs, indicating the potential for enhancing factuality in language models and encouraging future research in hallucination mitigation and fact-checking tasks.
๐ Paper link: https://huggingface.co/papers/2505.14101

37. Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach
๐ Keywords: Multimodal LLM, Sparse Mixture of Projectors, modality-specific routers, inference costs, noise robustness
๐ก Category: Multi-Modal Learning
๐ Research Objective:
– The research introduces Llama-SMoP, an efficient Multimodal LLM incorporating Sparse Mixture of Projectors, aiming to enhance Audio-Visual Speech Recognition (AVSR) performance without increasing inference costs.
๐ ๏ธ Research Methods:
– Llama-SMoP uses a Sparse Mixture of Projectors (SMoP) module, which includes sparsely-gated mixture-of-experts projectors. The study explores three SMoP configurations to ascertain the best setup for performance.
๐ฌ Research Conclusions:
– Llama-SMoP with Disjoint-Experts and Disjoint-Routers configuration achieves superior performance in ASR, VSR, and AVSR tasks. Ablation studies confirm enhancements in expert activation, scalability, and robustness to noise.
๐ Paper link: https://huggingface.co/papers/2505.14336

38. BLEUBERI: BLEU is a surprisingly effective reward for instruction following
๐ Keywords: BLEU, Reward Models, Instruction-Following, Group Relative Policy Optimization, Factually Grounded
๐ก Category: Reinforcement Learning
๐ Research Objective:
– To explore whether simpler, reference-based metrics like BLEU can serve as viable alternatives to reward models in the alignment of language models with human preferences.
๐ ๏ธ Research Methods:
– Development of BLEUBERI, which uses BLEU as a reward function and employs Group Relative Policy Optimization to train models on instruction-following datasets.
๐ฌ Research Conclusions:
– BLEUBERI-trained models match the quality of reward model-guided RL models and produce more factually grounded outputs, suggesting that string matching-based metrics can be effective and cost-efficient proxies for reward models when aligning language models.
๐ Paper link: https://huggingface.co/papers/2505.11080

39. Language Specific Knowledge: Do Models Know Better in X than in English?
๐ Keywords: Language Specific Knowledge, Chain-of-thought Reasoning, Code-switching, Low-resource Languages
๐ก Category: Natural Language Processing
๐ Research Objective:
– Investigate whether language models can hold more knowledge on certain topics when reasoning is performed in specific languages and improve performance in reasoning accuracy.
๐ ๏ธ Research Methods:
– The development of LSKExtractor methodology to benchmark and exploit Language Specific Knowledge present in language models, supported by culture-specific datasets.
๐ฌ Research Conclusions:
– Language models demonstrated a 10% average improvement in accuracy using chain-of-thought reasoning in certain languages, particularly low-resource ones, showing the effectiveness of Language Specific Knowledge.
๐ Paper link: https://huggingface.co/papers/2505.14990

40. The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
๐ Keywords: Entropy Minimization, Large Language Models, Math, Physics, Coding
๐ก Category: Natural Language Processing
๐ Research Objective:
– To enhance the performance of Large Language Models (LLMs) in math, physics, and coding tasks through entropy minimization without utilizing labeled data.
๐ ๏ธ Research Methods:
– Examining three approaches: EM-FT for minimizing token-level entropy, EM-RL where negative entropy is the sole reinforcement reward, and EM-INF for inference-time logit adjustments.
๐ฌ Research Conclusions:
– Entropy minimization can significantly improve LLMs’ performance on complex tasks without the need for labeled data or parameter updates, demonstrating comparable or superior performance to strong RL baselines.
๐ Paper link: https://huggingface.co/papers/2505.15134

41. BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms
๐ Keywords: Speculative decoding, Large Language Models, Multi-Armed Bandit, Hyperparameter selection, BanditSpec
๐ก Category: Natural Language Processing
๐ Research Objective:
– To introduce BanditSpec, a training-free online learning framework that adaptively selects hyperparameters for speculative decoding in Large Language Models, aiming to enhance performance and throughput.
๐ ๏ธ Research Methods:
– The problem of hyperparameter selection is formulated as a Multi-Armed Bandit problem.
– Two algorithms, UCBSpec and EXP3Spec, are developed and analyzed, focusing on stopping time regret in both stochastic and adversarial settings.
๐ฌ Research Conclusions:
– Empirical experiments demonstrate that BanditSpec, along with UCBSpec and EXP3Spec algorithms, effectively enhances the capability of Large Language Models compared to existing methods, achieving throughput close to the best possible hyperparameter configuration in real-world scenarios.
๐ Paper link: https://huggingface.co/papers/2505.15141

42.
