AI Native Daily Paper Digest – 20250825

1. AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

๐Ÿ”‘ Keywords: Memory-augmented Markov Decision Process (M-MDP), Neural case-selection policy, Episodic memory, Continuous learning

๐Ÿ’ก Category: Reinforcement Learning

๐ŸŒŸ Research Objective:

– Introduce a novel reinforcement learning paradigm for Large Language Model (LLM) agents to facilitate continuous learning without fine-tuning.

๐Ÿ› ๏ธ Research Methods:

– Utilizes an M-MDP framework with a neural case-selection policy, incorporating episodic memory and efficient memory retrieval mechanisms.

๐Ÿ’ฌ Research Conclusions:

– Demonstrated successful application in the AgentFly model, outperforming current methods with improved performance on various tasks and datasets.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2508.16153

2. ODYSSEY: Open-World Quadrupeds Exploration and Manipulation for Long-Horizon Tasks

๐Ÿ”‘ Keywords: ODYSSEY, language-guided, hierarchical planner, vision-language model, sim-to-real transfer

๐Ÿ’ก Category: Robotics and Autonomous Systems

๐ŸŒŸ Research Objective:

– Develop a unified mobile manipulation framework for quadruped robots integrating high-level task planning with low-level whole-body control.

๐Ÿ› ๏ธ Research Methods:

– Introduce a hierarchical planner powered by a vision-language model for precise action execution in language-conditioned tasks.

– Establish a robust whole-body policy for improved coordination across challenging terrains.

๐Ÿ’ฌ Research Conclusions:

– Demonstrate the system’s generalization and robustness in real-world deployments, advancing practical applicability of legged manipulators in unstructured environments.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2508.08240

3. Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR

๐Ÿ”‘ Keywords: RLVR, Large Language Models, Self-play, Variational problem Synthesis, Pass@k

๐Ÿ’ก Category: Reinforcement Learning

๐ŸŒŸ Research Objective:

– To improve the performance of Reinforcement Learning with Verifiable Rewards (RLVR) on complex reasoning tasks by maintaining policy entropy and enhancing Pass@k performance.

๐Ÿ› ๏ธ Research Methods:

– Developed an online Self-play strategy with Variational problem Synthesis (SvS) that uses policy’s correct solutions to generate variational problems, maintaining identical reference answers.

๐Ÿ’ฌ Research Conclusions:

– The SvS strategy substantially enhances policy entropy and improves Pass@k performance compared to standard RLVR, achieving significant gains on benchmarks like AIME24 and AIME25.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2508.14029

4. EgoTwin: Dreaming Body and View in First Person

๐Ÿ”‘ Keywords: EgoTwin, Viewpoint Alignment, Causal Interplay, diffusion transformer, head-centric motion representation

๐Ÿ’ก Category: Generative Models

๐ŸŒŸ Research Objective:

– The objective is to explore joint egocentric video and human motion generation by addressing challenges in viewpoint alignment and causal interplay.

๐Ÿ› ๏ธ Research Methods:

– The EgoTwin framework uses a diffusion transformer architecture, incorporating a head-centric motion representation and a cybernetics-inspired interaction mechanism to capture causal interplay.

๐Ÿ’ฌ Research Conclusions:

– Extensive experiments on a large-scale real-world dataset demonstrate the effectiveness of EgoTwin in achieving video-motion consistency.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2508.13013

5. CRISP: Persistent Concept Unlearning via Sparse Autoencoders

๐Ÿ”‘ Keywords: CRISP, sparse autoencoders, concept unlearning, large language models, semantically coherent separation

๐Ÿ’ก Category: Natural Language Processing

๐ŸŒŸ Research Objective:

– To introduce a parameter-efficient method, CRISP, for the persistent unlearning of unwanted knowledge in large language models, while maintaining the models’ utility.

๐Ÿ› ๏ธ Research Methods:

– Using sparse autoencoders to identify and suppress salient features across multiple layers of large language models to achieve persistent concept unlearning.

๐Ÿ’ฌ Research Conclusions:

– CRISP outperforms existing methods in safety-critical tasks by effectively removing harmful knowledge without compromising the general and in-domain capabilities of the models. It achieves a semantically coherent separation between target and benign concepts.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2508.13650

6. Selective Contrastive Learning for Weakly Supervised Affordance Grounding

๐Ÿ”‘ Keywords: affordance-relevant cues, weakly supervised affordance grounding, selective prototypical objectives, pixel contrastive objectives, CLIP

๐Ÿ’ก Category: Computer Vision

๐ŸŒŸ Research Objective:

– The paper aims to improve affordance grounding by employing selective prototypical and pixel contrastive objectives to learn affordance-relevant cues from third-person demonstrations.

๐Ÿ› ๏ธ Research Methods:

– The approach utilizes a combination of selective prototypical and pixel contrastive objectives to identify affordance-relevant parts and objects. This involves leveraging CLIP to locate action-associated objects in both egocentric and exocentric images, and cross-referencing these to find part-level affordance clues.

๐Ÿ’ฌ Research Conclusions:

– The method enhances the learning process by effectively distinguishing affordance-relevant regions from the background, thus improving the accuracy of affordance grounding. Experimentation confirms the effectiveness of this method.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2508.07877

7. AetherCode: Evaluating LLMs’ Ability to Win In Premier Programming Competitions

๐Ÿ”‘ Keywords: AetherCode, Competitive programming, Large Language Models, benchmark, evaluation bias

๐Ÿ’ก Category: Natural Language Processing

๐ŸŒŸ Research Objective:

– Introduce AetherCode, a new benchmark designed to evaluate the reasoning and coding capabilities of Large Language Models in competitive programming scenarios.

๐Ÿ› ๏ธ Research Methods:

– Develop a benchmark with problems from top programming competitions, such as IOI and ICPC, offering broader coverage and higher difficulty.

– Incorporate expert-validated test suites using a combination of automated generation and human curation for rigorous assessment.

๐Ÿ’ฌ Research Conclusions:

– AetherCode addresses current benchmark limitations by providing a challenging and reliable measure of LLM capabilities, setting a new standard for future research in code reasoning.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2508.16402

8. TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill \& Decode Inference

๐Ÿ”‘ Keywords: Tensor Parallelism, Latent Representation, AI-Generated Summary, Prefilling, FlashAttention-3

๐Ÿ’ก Category: AI Systems and Tools

๐ŸŒŸ Research Objective:

– Propose Tensor-Parallel Latent Attention (TPLA) to enhance tensor parallelism efficiency and maintain strong representational capacity.

๐Ÿ› ๏ธ Research Methods:

– Partition latent representation and input dimensions across devices, perform attention per shard, and combine results with an all-reduce.

– Support MLA-style prefilling and tensor-parallel decoding using orthogonal transforms to reduce cross-shard interference.

๐Ÿ’ฌ Research Conclusions:

– Achieved up to 1.93x speedups in tensor parallelism without retraining while maintaining performance on commonsense and LongBench benchmarks.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2508.15881

9. End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning

๐Ÿ”‘ Keywords: Agentic RAG system, Reinforcement Learning, Diagnostic Accuracy, Retrieval Corpus, AI in Healthcare

๐Ÿ’ก Category: AI in Healthcare

๐ŸŒŸ Research Objective:

– The aim is to enhance medical diagnosis accuracy by designing Deep-DxSearch, which integrates a large-scale retrieval corpus with tailored rewards.

๐Ÿ› ๏ธ Research Methods:

– The method involves using an end-to-end agentic RAG system with reinforcement learning, framing the LLM as the core agent within a retrieval corpus environment to enable retrieval-augmented reasoning.

๐Ÿ’ฌ Research Conclusions:

– The Deep-DxSearch system surpasses existing diagnostic systems such as GPT-4o in diagnostic accuracy and effectiveness, especially noteworthy in both common and rare diseases, validated through various experiments and case studies.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2508.15746

10. AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications

๐Ÿ”‘ Keywords: AgentScope, Agentic Applications, ReAct Paradigm, AI Generated Summary, Large Language Models

๐Ÿ’ก Category: AI Systems and Tools

๐ŸŒŸ Research Objective:

– AgentScope aims to enhance agentic applications by providing flexible tool-based interactions, unified interfaces, and advanced infrastructure based on the ReAct paradigm.

๐Ÿ› ๏ธ Research Methods:

– The study abstracts foundational components for agentic applications and offers a system with extensible modules and unified interfaces. It envisions the ReAct paradigm and adopts asynchronous design for improved interaction patterns and execution efficiency.

๐Ÿ’ฌ Research Conclusions:

– AgentScope introduces improvements for agent-environment interactions, integrating built-in agents for practical scenarios. It provides robust engineering support and a scalable evaluation module, ensuring safe execution and facilitating deployment.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2508.16279

11. Do What? Teaching Vision-Language-Action Models to Reject the Impossible

๐Ÿ”‘ Keywords: Instruct-Verify-and-Act, Vision-Language-Action, false-premise instructions, contextually augmented, semi-synthetic dataset

๐Ÿ’ก Category: Robotics and Autonomous Systems

๐ŸŒŸ Research Objective:

– The objective of the study is to enhance Vision-Language-Action models to effectively detect and respond to false-premise instructions by utilizing a unified framework, Instruct-Verify-and-Act.

๐Ÿ› ๏ธ Research Methods:

– The research involved constructing a large-scale instruction tuning setup with structured language prompts and training a VLA model using a contextually augmented, semi-synthetic dataset with paired positive and false-premise instructions.

๐Ÿ’ฌ Research Conclusions:

– The proposed IVA framework significantly improves false premise detection accuracy by 97.56% over existing baselines and increases successful responses in false-premise scenarios by 50.78%.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2508.16292

12. Distilled-3DGS:Distilled 3D Gaussian Splatting

๐Ÿ”‘ Keywords: 3D Gaussian Splatting, Knowledge Distillation, Novel View Synthesis, Structural Similarity Loss

๐Ÿ’ก Category: Computer Vision

๐ŸŒŸ Research Objective:

– To enhance rendering quality and storage efficiency in 3D Gaussian Splatting by utilizing a knowledge distillation framework.

๐Ÿ› ๏ธ Research Methods:

– Implementation of a knowledge distillation framework featuring multiple teacher models (including standard, noise-augmented, and dropout-regularized variants) to optimize a lightweight student model.

– Introduction of a structural similarity loss to maintain spatial geometric consistency between teacher and student models.

๐Ÿ’ฌ Research Conclusions:

– The proposed Distilled-3DGS framework significantly improves both rendering quality and storage efficiency over existing state-of-the-art methods through comprehensive evaluations.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2508.14037

13. Sketch3DVE: Sketch-based 3D-Aware Scene Video Editing

๐Ÿ”‘ Keywords: Sketch3DVE, 3D-aware video editing, viewpoint changes, point cloud editing, video diffusion model

๐Ÿ’ก Category: Computer Vision

๐ŸŒŸ Research Objective:

– Develop a sketch-based method for 3D-aware video editing that addresses challenges of sparse inputs and significant viewpoint changes.

๐Ÿ› ๏ธ Research Methods:

– Employ a combination of dense stereo estimation, point cloud editing, and a video diffusion model for precise manipulation of video content, incorporating sketching as an interactive geometry control tool.

๐Ÿ’ฌ Research Conclusions:

– Sketch3DVE effectively handles novel view content generation and merges new edits seamlessly with original videos, outperforming existing video editing methods.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2508.13797

14. InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles

๐Ÿ”‘ Keywords: LLMs, personalized reasoning styles, social deduction games, InMind, Human-AI Interaction

๐Ÿ’ก Category: Human-AI Interaction

๐ŸŒŸ Research Objective:

– The study aims to evaluate LLMs’ ability to capture and apply personalized reasoning styles in social deduction games using the InMind framework.

๐Ÿ› ๏ธ Research Methods:

– Introduction of a cognitive evaluation framework called InMind, which enhances structured gameplay data with strategy traces and post-game reflections in different modes.

๐Ÿ’ฌ Research Conclusions:

– The study finds that general-purpose LLMs, including GPT-4, struggle with adapting evolving strategies in social contexts, while reasoning-enhanced LLMs show early potential for style-sensitive reasoning, highlighting limitations in current models for individualized, adaptive reasoning.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2508.16072

15. RotaTouille: Rotation Equivariant Deep Learning for Contours

๐Ÿ”‘ Keywords: RotaTouille, rotation equivariance, cyclic shift equivariance, complex-valued circular convolution

๐Ÿ’ก Category: Computer Vision

๐ŸŒŸ Research Objective:

– Introducing RotaTouille framework to achieve rotation and cyclic shift equivariance in contour data analysis.

๐Ÿ› ๏ธ Research Methods:

– Utilization of complex-valued circular convolution and development of equivariant non-linearities, coarsening layers, and global pooling layers.

๐Ÿ’ฌ Research Conclusions:

– Demonstrated the effectiveness of RotaTouille in shape classification, reconstruction, and contour regression through various experiments.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2508.16359

16. Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts

๐Ÿ”‘ Keywords: hybrid framework, jailbreak attacks, malicious content detection, LLMs, dataset cleaning

๐Ÿ’ก Category: AI Systems and Tools

๐ŸŒŸ Research Objective:

– To propose a hybrid framework combining LLMs and human oversight to clean datasets and detect jailbreak attacks effectively.

๐Ÿ› ๏ธ Research Methods:

– Utilized a hybrid evaluation framework called MDH, combining LLM-based annotation with minimal human oversight, to balance accuracy and efficiency in malicious content detection.

๐Ÿ’ฌ Research Conclusions:

– Suggests that well-crafted developer messages can significantly increase jailbreak success, leading to the proposal of two new strategies, D-Attack and DH-CoT, to enhance jailbreak detection.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2508.10390

17. Learnable SMPLify: A Neural Solution for Optimization-Free Human Pose Inverse Kinematics

๐Ÿ”‘ Keywords: 3D human pose estimation, neural network, iterative optimization, Learnable SMPLify, single-pass regression model

๐Ÿ’ก Category: Computer Vision

๐ŸŒŸ Research Objective:

– To develop a faster and more generalized framework for 3D human pose and shape estimation by replacing iterative optimization with a neural network.

๐Ÿ› ๏ธ Research Methods:

– The framework employs a single-pass regression model and incorporates a temporal sampling strategy and human-centric normalization for data construction and generalization.

๐Ÿ’ฌ Research Conclusions:

– Learnable SMPLify achieves nearly 200x faster runtime compared to traditional methods while maintaining accuracy and operates effectively across various motions and as a model-agnostic tool.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2508.13562

18. CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning

๐Ÿ”‘ Keywords: Chain-of-Thought, Reinforcement Learning, Contrastive learning, Large Language Models, Supervised Fine-Tuning

๐Ÿ’ก Category: Natural Language Processing

๐ŸŒŸ Research Objective:

– Enhance reasoning performance of Large Language Models (LLMs) by overcoming limitations of existing fine-tuning approaches through a novel contrastive learning method based on annotated Chain-of-Thought.

๐Ÿ› ๏ธ Research Methods:

– Propose a contrastive learning approach integrated with Reinforced Fine-Tuning, utilizing annotated Chain-of-Thought to stabilize training and improve model robustness and performance. Conduct experiments with three baseline approaches, two foundation models, and two datasets.

๐Ÿ’ฌ Research Conclusions:

– This approach significantly improves the stability and performance of LLMs, demonstrating advantages in robustness, efficiency, and performance improvements up to 30.62%. The method effectively leverages unsupervised learning signals to enhance fine-tuning.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2508.15868

19.

๐Ÿ‘‰ Paper link: 

Blank Form (#4)
[email protected]

About

Ecosystem

Copyright 2025 AI Native Foundationยฉ . All rights reserved.โ€‹