AI Native Daily Paper Digest – 20260508

1. Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning
๐ Keywords: Skill1, skill selection, skill library, task-outcome objective, reinforcement learning
๐ก Category: Reinforcement Learning
๐ Research Objective:
– Develop Skill1, a unified framework that trains a single policy for skill selection, utilization, and distillation, achieving superior performance in complex tasks.
๐ ๏ธ Research Methods:
– Skill1 framework co-evolves skills by generating queries, re-ranking skill library candidates, solving tasks, and distilling new skills based on a shared task-outcome objective.
๐ฌ Research Conclusions:
– Skill1 outperforms existing skill-based and reinforcement learning models in environments like ALFWorld and WebShop, with evidence showing the effective co-evolution of capabilities.
๐ Paper link: https://huggingface.co/papers/2605.06130

2. Continuous Latent Diffusion Language Model
๐ Keywords: Hierarchical Latent Diffusion Language Model, Non-autoregressive Inductive Bias, Global Semantic Prior, Text Generation, Scaling Behavior
๐ก Category: Generative Models
๐ Research Objective:
– To propose Cola DLM, a hierarchical latent diffusion language model, as a principled alternative for efficient text generation, moving beyond traditional autoregressive paradigms.
๐ ๏ธ Research Methods:
– Utilized a hierarchical information decomposition approach with text-to-latent mapping, global semantic prior modeling, and conditional decoding.
– Conducted experiments with comparisons to autoregressive and LLaDA baselines across 8 benchmarks and 4 research questions.
๐ฌ Research Conclusions:
– Cola DLM successfully demonstrates flexible non-autoregressive inductive bias, supports semantic compression, and extends across continuous modalities.
– The results suggest hierarchical continuous latent prior modeling may offer superior generation quality and scaling behavior than token-level approaches, pointing towards unified modeling for discrete and continuous modalities.
๐ Paper link: https://huggingface.co/papers/2605.06548

3. RaguTeam at SemEval-2026 Task 8: Meno and Friends in a Judge-Orchestrated LLM Ensemble for Faithful Multi-Turn Response Generation
๐ Keywords: Heterogeneous Ensemble, Large Language Models, GPT-4o-mini, AI-Generated Summary, Domain-Adapted Model
๐ก Category: Natural Language Processing
๐ Research Objective:
– The study aimed to achieve top performance in the SemEval-2026 MTRAGEval task using a diverse ensemble of language models with dual prompting strategies.
๐ ๏ธ Research Methods:
– A heterogeneous ensemble of seven large language models was employed, featuring different prompting variants. A GPT-4o-mini judge selected the best candidate per instance.
๐ฌ Research Conclusions:
– The model diversity was crucial for performance, consistently outperforming any single model. Introduction of Meno-Lite-0.1 highlighted a strong cost-performance trade-off, and the study provided insights into MTRAGEval annotation limitations and potential improvements.
๐ Paper link: https://huggingface.co/papers/2605.04523

4. When to Trust Imagination: Adaptive Action Execution for World Action Models
๐ Keywords: World Action Models, Future-Reality Verification, Adaptive WAM Execution, Future Forward Dynamics Causal Attention, Mixture-of-Horizon Training
๐ก Category: Robotics and Autonomous Systems
๐ Research Objective:
– To enhance the execution of World Action Models (WAMs) by introducing adaptive mechanisms that ensure the predicted future remains consistent with real-world observations.
๐ ๏ธ Research Methods:
– Implementation of Future Forward Dynamics Causal Attention (FFDC) to verify consistency between predicted and real observations, enabling adaptive action execution.
– Introduction of Mixture-of-Horizon Training to improve the coverage of long-horizon trajectories in robotic manipulation.
๐ฌ Research Conclusions:
– The proposed method improves the robustness-efficiency trade-off, reducing WAM forward passes and execution time while increasing the success rate in both benchmark and real-world experiments.
๐ Paper link: https://huggingface.co/papers/2605.06222

5. SkillOS: Learning Skill Curation for Self-Evolving Agents
๐ Keywords: SkillOS, self-evolving agents, skill curation, composite rewards, SkillRepo
๐ก Category: Reinforcement Learning
๐ Research Objective:
– The study aims to develop SkillOS, a reinforcement learning framework for enabling self-evolving LLM agents to learn complex long-term skill curation policies that improve performance across diverse tasks and executor architectures.
๐ ๏ธ Research Methods:
– SkillOS pairs a frozen agent executor with a trainable skill curator, utilizing composite rewards and grouped task streams to learn from skill-relevant task dependencies, updating an external SkillRepo based on experience.
๐ฌ Research Conclusions:
– SkillOS consistently surpasses both memory-free and memory-based baselines in effectiveness and efficiency, with learnings from the skill curator generalizing well across different executor backbones and task domains, producing more targeted skill use and evolving higher-level meta-skills.
๐ Paper link: https://huggingface.co/papers/2605.06614

6. Auto Research with Specialist Agents Develops Effective and Non-Trivial Training Recipes
๐ Keywords: empirical loop, lineage feedback, specialist agents, program-level recipe edits, AI-generated summary
๐ก Category: AI Systems and Tools
๐ Research Objective:
– Investigate an empirical loop that autonomously refines code through external evaluation feedback without human intervention.
๐ ๏ธ Research Methods:
– Utilize specialist agents to create trials that include code edits and evaluations, iterating over an auditable trajectory of proposals and experiments.
๐ฌ Research Conclusions:
– Demonstrated significant improvements in various tasks such as Parameter Golf validation, NanoChat-D12 CORE, and CIFAR-10 Airbench96 wallclock time without human proposal or intervention.
๐ Paper link: https://huggingface.co/papers/2605.05724

7. Audio-Visual Intelligence in Large Foundation Models
๐ Keywords: Audio-Visual Intelligence, large foundation models, multimodal data, cross-modal fusion, Audio-Visual Intelligence
๐ก Category: Multi-Modal Learning
๐ Research Objective:
– The paper aims to provide a comprehensive review of Audio-Visual Intelligence (AVI) through the lens of large foundation models, establishing a unified taxonomy for understanding, generation, and interaction tasks within this multidisciplinary field.
๐ ๏ธ Research Methods:
– The survey synthesizes methodological foundations such as modality tokenization, cross-modal fusion, autoregressive and diffusion-based generation, and large-scale pretraining, among others, to structure and integrate diverse tasks and practices in AVI.
๐ฌ Research Conclusions:
– A coherent framework is established offering structured comparisons across task families, identifying open challenges in synchronization, spatial reasoning, controllability, and safety. The survey underscores the importance of unified audio-vision architectures for future research in large-scale AVI.
๐ Paper link: https://huggingface.co/papers/2605.04045

8. Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key
๐ Keywords: ScaleLogic, Reinforcement Learning, Logical Reasoning, Scaling Exponent, Curriculum-Based Training
๐ก Category: Reinforcement Learning
๐ Research Objective:
– The objective is to systematically study the scaling of reinforcement learning training compute with task difficulty using a new framework, ScaleLogic.
๐ ๏ธ Research Methods:
– Introduced ScaleLogic, a synthetic logical reasoning framework, allowing independent control over reasoning depth and logical expressiveness. Analysis conducted on the scaling of reinforcement learning with these factors.
๐ฌ Research Conclusions:
– Reinforcement Learning compute scales by a power law with reasoning depth, with scaling exponent increasing with logical expressiveness. More expressive training leads to larger performance gains and compute efficiency in downstream tasks.
๐ Paper link: https://huggingface.co/papers/2605.06638

9. ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving
๐ Keywords: ReflectDrive-2, autonomous driving, discrete diffusion planner, reinforcement learning, trajectory revision
๐ก Category: Robotics and Autonomous Systems
๐ Research Objective:
– The study introduces ReflectDrive-2, aiming to improve autonomous driving by enabling efficient trajectory revision through a masked discrete diffusion planner and parallel decoding.
๐ ๏ธ Research Methods:
– Utilizes a two-stage training approach, combining structure-aware perturbations to refine trajectories and reinforcement learning to enhance trajectory revision and decision-making.
– Implements a decision–draft–reflect pipeline co-designed with a reflective decoding stack to optimize performance.
๐ฌ Research Conclusions:
– ReflectDrive-2 demonstrates significant improvement in PDMS performance, reaching 91.0 with camera-only input and 94.8 in an oracle setup using NAVSIM, with a low average latency of 31.8 ms on NVIDIA Thor.
๐ Paper link: https://huggingface.co/papers/2605.04647

10. SwiftI2V: Efficient High-Resolution Image-to-Video Generation via Conditional Segment-wise Generation
๐ Keywords: SwiftI2V, High-resolution I2V, Conditional Segment-wise Generation, bidirectional contextual interaction, token budget
๐ก Category: Generative Models
๐ Research Objective:
– The main objective of this research is to develop an efficient high-resolution image-to-video (I2V) generation framework that addresses existing challenges and achieves scalable, input-faithful video synthesis with reduced computational requirements.
๐ ๏ธ Research Methods:
– SwiftI2V employs a two-stage design with Conditional Segment-wise Generation to synthesize videos segment-by-segment, using a bounded per-step token budget to improve efficiency.
– The approach utilizes bidirectional contextual interaction to enhance cross-segment coherence and input fidelity.
๐ฌ Research Conclusions:
– SwiftI2V significantly reduces the computational load, achieving a 202x reduction in total GPU-time, and enables practical 2K I2V generation on both datacenter and consumer GPUs, maintaining performance comparable to end-to-end baselines.
๐ Paper link: https://huggingface.co/papers/2605.06356
11. RemoteZero: Geospatial Reasoning with Zero Human Annotations
๐ Keywords: RemoteZero, Geospatial Reasoning, MLLM, Self-Evolution, Semantic Verification
๐ก Category: Knowledge Representation and Reasoning
๐ Research Objective:
– Introduce RemoteZero to enable geospatial reasoning without box supervision, utilizing semantic verification capabilities of MLLMs to improve localization from unlabeled remote sensing data.
๐ ๏ธ Research Methods:
– Leverage the discriminative ability of MLLMs for semantic verification instead of traditional geometric supervision to facilitate GRPO training without box annotations.
๐ฌ Research Conclusions:
– RemoteZero achieves competitive performance against strong supervised methods, showcasing the potential of self-verifying training for geospatial reasoning and localization.
๐ Paper link: https://huggingface.co/papers/2605.04451

12. The Scaling Properties of Implicit Deductive Reasoning in Transformers
๐ Keywords: Deep Transformers, bidirectional masking, implicit deductive reasoning, Horn clauses, algorithmic alignment
๐ก Category: Knowledge Representation and Reasoning
๐ Research Objective:
– To investigate the scaling properties of implicit deductive reasoning in depth-bounded Transformers using bidirectional masking.
๐ ๏ธ Research Methods:
– Systematic decorrelation of provability from spurious features and enforcing algorithmic alignment in deep models.
๐ฌ Research Conclusions:
– Implicit reasoning in sufficiently deep models with bidirectional prefix masking can closely match explicit chain-of-thought performance across various graph structures and problem sizes. However, explicit chain-of-thought methods remain necessary for depth extrapolation.
๐ Paper link: https://huggingface.co/papers/2605.04330

13. Prescriptive Scaling Laws for Data Constrained Training
๐ Keywords: Data-Constrained Regimes, Overfitting Penalty, Compute-Optimal Allocation, Weight Decay, Scaling Law
๐ก Category: Machine Learning
๐ Research Objective:
– The research objective is to modify the Chinchilla scaling law to account for data repetition effects and provide compute-optimal training strategies in data-constrained scenarios.
๐ ๏ธ Research Methods:
– The methods involve modeling excess loss under data repetition with an additive overfitting penalty, allowing for adjustments in compute allocation.
๐ฌ Research Conclusions:
– The study finds that further data repetition is counterproductive after a certain point, and resources are better allocated to model capacity. The research also concludes that strong weight decay significantly reduces the overfitting coefficient, aligning with recent findings in data-constrained regimes.
๐ Paper link: https://huggingface.co/papers/2605.01640

14. PianoCoRe: Combined and Refined Piano MIDI Dataset
๐ Keywords: PianoCoRe, music information retrieval, MIDI, note-level alignment, expressive performance modeling
๐ก Category: AI Systems and Tools
๐ Research Objective:
– The study introduces PianoCoRe, a large-scale piano MIDI dataset, to enhance applications in music information retrieval by providing diverse performances and note-level alignments.
๐ ๏ธ Research Methods:
– The authors curated and refined major open-source piano corpora, resulting in a dataset with 250,046 performances and tiered subsets to support various applications. They also developed a MIDI quality classifier and the RAScoP alignment refinement pipeline.
๐ฌ Research Conclusions:
– PianoCoRe reduces temporal noise and eliminates tempo outliers, showing improved robustness in expressive performance modeling compared to models trained on smaller datasets. This positions PianoCoRe as a comprehensive resource for future piano performance research.
๐ Paper link: https://huggingface.co/papers/2605.06627

15. Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study
๐ Keywords: Multimodal Domain Generalization, AI-generated summary, action recognition, sentiment analysis, neural networks
๐ก Category: Multi-Modal Learning
๐ Research Objective:
– Introduction of MMDG-Bench, a unified benchmark for evaluating Multimodal Domain Generalization (MMDG) across diverse tasks and modalities.
๐ ๏ธ Research Methods:
– Standardized evaluation using six datasets across three tasks: action recognition, mechanical fault diagnosis, and sentiment analysis. It involves six modality combinations and nine representative methods, with systematic assessment criteria including corruption robustness and missing-modality generalization.
๐ฌ Research Conclusions:
– Specialized MMDG methods show only marginal improvements over the ERM baseline.
– No single method consistently outperforms across all settings, with a significant performance gap remaining.
– Trimodal fusion does not consistently surpass bimodal configurations.
– All methods suffer notably under corruption and missing-modality scenarios, affecting model trustworthiness.
๐ Paper link: https://huggingface.co/papers/2605.06643

16. EDU-CIRCUIT-HW: Evaluating Multimodal Large Language Models on Real-World University-Level STEM Student Handwritten Solutions
๐ Keywords: MLLMs, handwritten solutions, auto-grading, upstream recognition, AI-enabled grading system
๐ก Category: AI in Education
๐ Research Objective:
– The study aims to evaluate the capabilities of Multimodal Large Language Models (MLLMs) in interpreting complex STEM handwritten student solutions to improve educational grading systems.
๐ ๏ธ Research Methods:
– Released EDU-CIRCUIT-HW dataset with 1,300+ authentic handwritten student solutions.
– Conducted evaluations using expert-verified transcriptions and grading reports, assessing both upstream recognition fidelity and downstream auto-grading performance of various MLLMs.
๐ฌ Research Conclusions:
– MLLMs exhibit significant limitations in understanding complex handwritten logic, affecting their reliability for auto-grading.
– A proposed hybrid approach, combining error detection with minimal human oversight, can enhance AI-enabled grading robustness.
๐ Paper link: https://huggingface.co/papers/2602.00095

17. Generative Quantum-inspired Kolmogorov-Arnold Eigensolver
๐ Keywords: Quantum Chemistry, Kolmogorov-Arnold eigensolver, Quantum-inspired, HPC, Strongly Correlated Systems
๐ก Category: Quantum Machine Learning
๐ Research Objective:
– The study aims to reduce classical computational overhead in quantum chemistry workflows while maintaining accuracy and improving convergence for strongly correlated systems using a generative quantum-inspired Kolmogorov-Arnold eigensolver (GQKAE).
๐ ๏ธ Research Methods:
– GQKAE is a parameter-efficient extension of the generative quantum eigensolver, replacing parameter-heavy networks with hybrid quantum-inspired Kolmogorov-Arnold modules. It utilizes single-qubit DatA Re-Uploading ActivatioN modules for expressive mappings.
๐ฌ Research Conclusions:
– GQKAE achieves chemical accuracy comparable to GPT-based architectures while reducing trainable parameters and memory by approximately 66%. It also enhances convergence and final energy errors for strongly correlated systems like N2 and LiH, offering a scalable approach for HPC-quantum co-design on near-term quantum platforms.
๐ Paper link: https://huggingface.co/papers/2605.04604

18. Sparkle: Realizing Lively Instruction-Guided Video Background Replacement via Decoupled Guidance
๐ Keywords: Background Replacement, Video Editing, Foreground-Background Interactions, Data Synthesis, Evaluation Benchmark
๐ก Category: Computer Vision
๐ Research Objective:
– Introduce a new dataset and benchmark for background replacement in video editing, addressing the limitations in existing datasets with a scalable pipeline.
๐ ๏ธ Research Methods:
– Designed a pipeline that decouples foreground and background guidance with strict quality filtering to generate high-quality datasets.
๐ฌ Research Conclusions:
– Sparkle dataset and the model trained on it show substantially better performance than existing baselines on both OpenVE-Bench and Sparkle-Bench, filling a significant gap in background replacement tasks.
๐ Paper link: https://huggingface.co/papers/2605.06535

19. TIDE: Every Layer Knows the Token Beneath the Context
๐ Keywords: TIDE, EmbeddingMemory, Rare Token Problem, Contextual Collapse Problem
๐ก Category: Natural Language Processing
๐ Research Objective:
– The study aims to address limitations in large language model (LLM) design by introducing TIDE to mitigate the Rare Token and Contextual Collapse Problems.
๐ ๏ธ Research Methods:
– TIDE augments the transformer with an EmbeddingMemory system which involves independent MemoryBlocks that map token indices to context-free semantic vectors, injected at each layer through a depth-conditioned softmax router.
๐ฌ Research Conclusions:
– TIDE effectively addresses issues related to single-token identity injection, enhancing performance in language modeling and various downstream tasks.
๐ Paper link: https://huggingface.co/papers/2605.06216

20.

21. BioTool: A Comprehensive Tool-Calling Dataset for Enhancing Biomedical Capabilities of Large Language Models
๐ Keywords: Biomedical tool-calling, Large Language Model, AI Native, Fine-tuning, BioTool
๐ก Category: AI in Healthcare
๐ Research Objective:
– To develop BioTool, a comprehensive biomedical tool-calling dataset aimed at improving the performance of large language models (LLMs) in specialized biomedical domains.
๐ ๏ธ Research Methods:
– Fine-tuning a 4-billion-parameter LLM on the BioTool dataset, which includes 34 tools and 7,040 human-verified API call pairs covering areas like genomics and proteomics.
๐ฌ Research Conclusions:
– BioTool fine-tuning significantly boosts LLM performance in biomedical tool-calling, surpassing commercial alternatives like GPT-5.1, and enhances downstream answer quality as per human expert evaluations.
๐ Paper link: https://huggingface.co/papers/2605.05758

22. Recovering Hidden Reward in Diffusion-Based Policies
๐ Keywords: EnergyFlow, inverse reinforcement learning, reward extraction, policy generalization, structural constraints
๐ก Category: Reinforcement Learning
๐ Research Objective:
– The paper introduces EnergyFlow, a framework designed to unify generative action modeling with inverse reinforcement learning to enhance reward extraction and policy generalization.
๐ ๏ธ Research Methods:
– The authors parameterize a scalar energy function with its gradient acting as a denoising field, facilitating reward extraction without adversarial training.
– The framework capitalizes on maximum-entropy optimality and denoising score matching to recover expert’s soft Q-function gradients.
๐ฌ Research Conclusions:
– EnergyFlow achieves state-of-the-art performance in imitation tasks and provides a robust reward signal for reinforcement learning, outperforming traditional IRL methods.
– Structural constraints aid in reducing hypothesis complexity and improving out-of-distribution generalization, serving as inductive biases for policy generalization.
๐ Paper link: https://huggingface.co/papers/2605.00623

23. When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels
๐ Keywords: Safety Scoring, Scenario-Based Audit, Instrumental-Validity Chain, AUROC, Local-First Scoring
๐ก Category: AI Systems and Tools
๐ Research Objective:
– The study aims to establish a method for benchmarkless comparative safety scoring to evaluate language model safety in the absence of pre-existing labeled benchmarks.
๐ ๏ธ Research Methods:
– Utilizes scenario-based audits and an instrumental-validity chain to assess responsiveness, variance dominance, and stability. Demonstrated with a local-first scoring instrument, SimpleAudit, validated on a Norwegian safety pack.
๐ฌ Research Conclusions:
– The research demonstrates that in the Norwegian public-sector case, safety assessments vary based on scenario category and risk measure. It stresses that scores and related metrics must be reported collectively for meaningful deployment evidence.
๐ Paper link: https://huggingface.co/papers/2605.06652

24. GeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMs
๐ Keywords: GeoStack, Vision-Language Models, domain experts, adapter manifold, catastrophic forgetting
๐ก Category: Multi-Modal Learning
๐ Research Objective:
– Address the challenge of knowledge composition in Vision-Language Models without causing catastrophic forgetting by using GeoStack.
๐ ๏ธ Research Methods:
– Introduce a modular framework, GeoStack, which enforces geometric and structural constraints on adapter manifolds.
– Demonstrate a weight-folding property that ensures O(1) inference complexity regardless of the number of domain experts integrated.
๐ฌ Research Conclusions:
– GeoStack effectively provides long-term knowledge composition, significantly mitigating catastrophic forgetting while enabling efficient multi-domain adaptation and class-incremental learning.
๐ Paper link: https://huggingface.co/papers/2605.06477

25. Think, then Score: Decoupled Reasoning and Scoring for Video Reward Modeling
๐ Keywords: video reward models, Chain-of-Thought reasoning, decoupled think-then-score, reinforcement learning, multimodal large language models
๐ก Category: Generative Models
๐ Research Objective:
– The study aims to enhance the training efficiency and generalization of video reward models by decoupling the thinking and scoring processes.
๐ ๏ธ Research Methods:
– The researchers introduced DeScore, a two-stage framework that includes a discriminative cold start with a random mask mechanism and dual-objective reinforcement learning.
๐ฌ Research Conclusions:
– DeScore improves interpretability and generalization by using a “think-then-score” paradigm, enhancing the model’s reasoning quality and ensuring alignments with human preferences.
๐ Paper link: https://huggingface.co/papers/2605.05922

26. Balanced Aggregation: Understanding and Fixing Aggregation Bias in GRPO
๐ Keywords: Balanced Aggregation, Reinforcement Learning, Token-Level Policy Gradient, Training Stability, Final Performance
๐ก Category: Reinforcement Learning
๐ Research Objective:
– To enhance reinforcement learning with verifiable rewards by addressing optimization biases in token-level policy gradient aggregation.
๐ ๏ธ Research Methods:
– Implementation of Balanced Aggregation, computing token-level means within positive and negative subsets and combining them with sequence-count-based weights.
๐ฌ Research Conclusions:
– Balanced Aggregation improves training stability and performance over standard token and sequence aggregation, highlighting the critical role of aggregation in GRPO-style reinforcement learning.
๐ Paper link: https://huggingface.co/papers/2605.04077

27. KernelBench-X: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels
๐ Keywords: KernelBench-X, Triton kernel generation, correctness, iterative refinement, hardware efficiency
๐ก Category: AI Systems and Tools
๐ Research Objective:
– To investigate where LLM-generated Triton kernel generation capability breaks down and why, utilizing the KernelBench-X benchmark.
๐ ๏ธ Research Methods:
– Systematic comparison of five representative methods across 176 tasks in 15 categories focusing on category-aware evaluation of correctness and hardware efficiency.
๐ฌ Research Conclusions:
– Task structure affects kernel correctness more than method design.
– Iterative refinement improves correctness but at the cost of performance.
– Correctness does not guarantee efficiency; significant variance in kernel performance.
๐ Paper link: https://huggingface.co/papers/2605.04956

28. The Granularity Axis: A Micro-to-Macro Latent Direction for Social Roles in Language Models
๐ Keywords: Large language models (LLMs), social roles, Granularity Axis, hidden states, activation steering
๐ก Category: Natural Language Processing
๐ Research Objective:
– Investigate whether large language models encode the granularity of social roles, ranging from individual to organizational levels, in their internal representations.
๐ ๏ธ Research Methods:
– Define a contrast-based Granularity Axis to analyze role representation space and perform projections of role-level hidden states.
– Conduct experiments with Qwen3-8B and Llama-3.1-8B-Instruct, constructing 75 social roles across five granularity levels and collecting 91,200 role-conditioned responses.
๐ฌ Research Conclusions:
– The study finds that social role granularity is a structured and causally manipulable latent direction in the behavior of language models.
– Activation steering along the Granularity Axis effectively shifts response granularity, with differences in controllability between models suggesting variability in default operating regimes.
๐ Paper link: https://huggingface.co/papers/2605.06196

29. AI Co-Mathematician: Accelerating Mathematicians with Agentic AI
๐ Keywords: AI co-mathematician, mathematical workflows, theorem proving, stateful workspace, problem-solving benchmarks
๐ก Category: AI Systems and Tools
๐ Research Objective:
– To create an interactive platform, AI co-mathematician, that assists mathematicians in open-ended research utilizing AI agents.
๐ ๏ธ Research Methods:
– Provides a holistic support system for workflows including ideation, literature search, theorem proving, and theory building through a stateful, asynchronous workspace.
๐ฌ Research Conclusions:
– The AI co-mathematician successfully aids researchers in solving open problems, discovering new research directions, and identifying overlooked literature. It achieves state-of-the-art results, including a 48% score on FrontierMath Tier 4 benchmarks.
๐ Paper link: https://huggingface.co/papers/2605.06651

30. TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding
๐ Keywords: TabEmbed, Tabular Embedding Benchmark, Semantic Matching, Contrastive Learning, Universal Tabular Representation Learning
๐ก Category: Machine Learning
๐ Research Objective:
– Introduce TabEmbed, a generalist embedding model for unifying tabular classification and retrieval within a shared embedding space.
๐ ๏ธ Research Methods:
– Employ large-scale contrastive learning with positive-aware hard negative mining to address tabular tasks reformulated as semantic matching problems.
๐ฌ Research Conclusions:
– TabEmbed significantly outperforms existing text embedding models on TabBench, setting a new standard for universal tabular representation learning.
๐ Paper link: https://huggingface.co/papers/2605.04962

31. UniPool: A Globally Shared Expert Pool for Mixture-of-Experts
๐ Keywords: Mixture-of-Experts, Shared Expert Pool, Stable Routing, Parameter Growth, Depth Scaling
๐ก Category: Machine Learning
๐ Research Objective:
– The goal of this research is to introduce UniPool, a novel shared expert pool architecture for Mixture-of-Experts (MoE) models, aimed at reducing parameter growth with depth while maintaining or improving model performance.
๐ ๏ธ Research Methods:
– UniPool employs a global shared pool of expert capacity, accessed via independent per-layer routers. The design incorporates stable and balanced training mechanisms, including a pool-level auxiliary loss and the NormRouter for scale-stable routing.
๐ฌ Research Conclusions:
– UniPool consistently improves validation loss and perplexity over conventional MoE baselines across various model scales, demonstrating that expert parameters can grow sublinearly under a shared-pool architecture, enhancing efficiency and effectiveness without linear parameter expansion.
๐ Paper link: https://huggingface.co/papers/2605.06665

32. A^2TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping
๐ Keywords: Reinforcement Learning, Agentic LLMs, Information Gain, Policy Optimization, Adaptive Turn-level Clipping
๐ก Category: Reinforcement Learning
๐ Research Objective:
– To improve policy optimization for agentic large language models (LLMs) experiencing sparse rewards and credit assignment issues through a novel approach, AยฒTGPO.
๐ ๏ธ Research Methods:
– Utilizes Information Gain as an intrinsic process signal while introducing turn-group normalization, variance-rescaled discounted accumulation, and adaptive turn-level clipping to optimize policy updates.
๐ฌ Research Conclusions:
– The proposed AยฒTGPO method effectively redesigns the process of information gain normalization, accumulation, and clipping to better evaluate and optimize policy in reinforcement learning for agentic LLMs.
๐ Paper link: https://huggingface.co/papers/2605.06200

33. StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction
๐ Keywords: Strategic Trajectory Abstraction, trajectory-level strategy, reinforcement learning, sample efficiency, final performance
๐ก Category: Reinforcement Learning
๐ Research Objective:
– To introduce a framework named Strategic Trajectory Abstraction (StraTA) to enhance long-horizon decision making in large language models.
๐ ๏ธ Research Methods:
– Implementation of StraTA that uses a combination of trajectory-level strategy and hierarchical GRPO-style rollout design, with enhancements like diverse strategy rollout and critical self-judgment.
๐ฌ Research Conclusions:
– StraTA significantly improves sample efficiency and performance, achieving high success rates of 93.1% on ALFWorld and 84.2% on WebShop, and a 63.5% overall score in SciWorld, surpassing existing strong baselines.
๐ Paper link: https://huggingface.co/papers/2605.06642

34. Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration
๐ Keywords: Reinforcement Learning, verifiable rewards, Large Language Models, zero-advantage problem, Lorem Perturbation for Exploration
๐ก Category: Reinforcement Learning
๐ Research Objective:
– The study aims to address the zero-advantage problem in reinforcement learning when training Large Language Models by introducing a novel approach called Lorem Perturbation for Exploration (LoPE).
๐ ๏ธ Research Methods:
– The research proposes using prompt-space perturbations by prepending sequences from the Lorem Ipsum vocabulary to prompts to enhance exploration in model training across various model sizes (1.7B, 4B, and 7B).
๐ฌ Research Conclusions:
– LoPE significantly improves exploration success rates compared to conventional resampling methods, establishing itself as an effective baseline for enhancing exploration in LLM reinforcement learning applications.
๐ Paper link: https://huggingface.co/papers/2605.05566

35. Continuous-Time Distribution Matching for Few-Step Diffusion Distillation
๐ Keywords: Continuous-Time Distribution Matching, diffusion model distillation, velocity field extrapolation, visual fidelity.
๐ก Category: Generative Models
๐ Research Objective:
– The paper introduces Continuous-Time Distribution Matching (CDM) to transition diffusion model distillation from discrete to continuous optimization.
๐ ๏ธ Research Methods:
– Two continuous-time designs, including a dynamic continuous schedule and continuous-time alignment objective, are utilized to enforce distribution matching at arbitrary points on sampling trajectories.
๐ฌ Research Conclusions:
– CDM enhances visual fidelity for few-step image generation without complex auxiliary objectives, as evidenced by experiments on architectures like SD3-Medium and Longcat-Image.
๐ Paper link: https://huggingface.co/papers/2605.06376

36. MARBLE: Multi-Aspect Reward Balance for Diffusion RL
๐ Keywords: MARBLE, multi-reward reinforcement learning, diffusion models, quadratic programming, policy gradients
๐ก Category: Reinforcement Learning
๐ Research Objective:
– To address limitations in multi-reward reinforcement learning fine-tuning of diffusion models by leveraging a gradient-space optimization framework without manual reward weighting.
๐ ๏ธ Research Methods:
– Implementation of MARBLE, which maintains independent advantage estimators and harmonizes policy gradients through solving a quadratic programming problem. Introduces an amortized formulation to reduce computational costs while stabilizing updates with EMA smoothing.
๐ฌ Research Conclusions:
– MARBLE improves alignment across all reward dimensions on a test model with five rewards, turning the worst-aligned reward’s gradient cosine consistently positive in the majority of mini-batches, while maintaining nearly baseline training speed.
๐ Paper link: https://huggingface.co/papers/2605.06507

37. MiA-Signature: Approximating Global Activation for Long-Context Understanding
๐ Keywords: Mindscape Activation Signature, compressed representation, global activation pattern, long-context understanding, computational efficiency
๐ก Category: Natural Language Processing
๐ Research Objective:
– To propose a compressed representation method, termed as Mindscape Activation Signature (MiA-Signature), for approximating global activation states in large language models while retaining computational efficiency.
๐ ๏ธ Research Methods:
– Implementing MiA-Signature through submodular-based selection of high-level concepts, and refining it using lightweight iterative updates with the aid of working memory.
๐ฌ Research Conclusions:
– Integration of MiA-Signatures into RAG and agentic systems shows consistent performance gains in long-context understanding tasks.
๐ Paper link: https://huggingface.co/papers/2605.06416

38. Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction
๐ Keywords: Direct corpus interaction, Agentic search, Modern retrieval systems, BRIGHT and BEIR datasets, Multi-hop QA
๐ก Category: Natural Language Processing
๐ Research Objective:
– To improve agentic search effectiveness by enabling agents to directly query raw text, surpassing limitations of traditional retrieval methods.
๐ ๏ธ Research Methods:
– The study employs direct corpus interaction (DCI) using general-purpose terminal tools without reliance on embedding models or retrieval APIs.
๐ฌ Research Conclusions:
– Direct corpus interaction significantly outperforms existing retrieval methods across various IR benchmarks and agentic search tasks, offering a broader interface-design space for enhanced retrieval and reasoning ability.
๐ Paper link: https://huggingface.co/papers/2605.05242
