AI Native Daily Paper Digest – 20260126

1. LongCat-Flash-Thinking-2601 Technical Report

๐Ÿ”‘ Keywords: Mixture-of-Experts, agentic reasoning, domain-parallel expert training, asynchronous reinforcement learning, real-world noise

๐Ÿ’ก Category: Knowledge Representation and Reasoning

๐ŸŒŸ Research Objective:

– Introduce a 560-billion-parameter Mixture-of-Experts model achieving state-of-the-art performance on agentic benchmarks.

๐Ÿ› ๏ธ Research Methods:

– Utilize a unified training framework combining domain-parallel expert training with fusion.

– Extend asynchronous reinforcement learning for stable and efficient multi-environment training.

๐Ÿ’ฌ Research Conclusions:

– The model demonstrates strong generalization to complex tool interactions and robust behavior in real-world environments.

– Enhanced robustness is achieved by incorporating real-world noise patterns into the training process.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2601.16725

2. TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers

๐Ÿ”‘ Keywords: TwinBrainVLA, Vision-Language Model (VLM), robotic control, Asymmetric Mixture-of-Transformers, proprioception

๐Ÿ’ก Category: Robotics and Autonomous Systems

๐ŸŒŸ Research Objective:

– The objective of the research is to resolve the tension between maintaining high-level semantic understanding and learning fine-grained sensorimotor skills in robotic control using a novel model called TwinBrainVLA.

๐Ÿ› ๏ธ Research Methods:

– TwinBrainVLA coordinates a generalist and a specialist VLM through an asymmetric mixture-of-transformers mechanism, combining a frozen “Left Brain” for visual reasoning with a trainable “Right Brain” for embodied perception.

๐Ÿ’ฌ Research Conclusions:

– Experiments reveal that TwinBrainVLA surpasses state-of-the-art baselines in manipulation performance while preserving comprehensive visual understanding, suggesting a promising path for developing versatile robots that balance semantic understanding and physical dexterity.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2601.14133

3. Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory

๐Ÿ”‘ Keywords: Memory-V2V, video-to-video diffusion models, cross-consistency, dynamic tokenization, DiT backbone

๐Ÿ’ก Category: Generative Models

๐ŸŒŸ Research Objective:

– The study aims to enhance multi-turn video editing by introducing Memory-V2V, which maintains cross-consistency through explicit memory mechanisms and efficient token compression.

๐Ÿ› ๏ธ Research Methods:

– Memory-V2V employs an external cache for accurate retrieval and dynamic tokenization to condition the current editing on prior results. A learnable token compressor within the DiT backbone is proposed to compress redundant tokens while preserving essential visual cues.

๐Ÿ’ฌ Research Conclusions:

– The research demonstrates that Memory-V2V significantly improves cross-consistency and computational efficiency in video editing, with a 30% speedup, while maintaining or improving task-specific performance compared to state-of-the-art baselines.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2601.16296

4. Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow

๐Ÿ”‘ Keywords: Quantized RL training, FP8 precision, numerical mismatch, Jet-RL, stability issues

๐Ÿ’ก Category: Reinforcement Learning

๐ŸŒŸ Research Objective:

– The study aims to address the challenges in Quantized Reinforcement Learning training, specifically focusing on optimizing training stability and efficiency using FP8 precision.

๐Ÿ› ๏ธ Research Methods:

– A comprehensive analysis of FP8 RL training is conducted, proposing a unified FP8 precision framework known as Jet-RL. Extensive experiments compare Jet-RL with traditional BF16-training + FP8-rollout strategies.

๐Ÿ’ฌ Research Conclusions:

– Jet-RL demonstrates significant improvements, achieving up to 33% speedup in the rollout phase, 41% speedup in the training phase, and a 16% overall speedup, while maintaining stable convergence and minimal accuracy degradation.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2601.14243

5. DSGym: A Holistic Framework for Evaluating and Training Data Science Agents

๐Ÿ”‘ Keywords: Data Science Agents, DSGym, AI Native, Benchmarks, Execution-verified

๐Ÿ’ก Category: AI Systems and Tools

๐ŸŒŸ Research Objective:

– To introduce DSGym, a standardized framework for evaluating and training data science agents within self-contained execution environments.

๐Ÿ› ๏ธ Research Methods:

– Development of DSGym with a modular architecture to add tasks and tools, along with a curated task suite DSGym-Tasks, including DSBio and DSPredict for expanded coverage.

๐Ÿ’ฌ Research Conclusions:

– DSGym provides a comprehensive testbed for rigorous end-to-end measurement of data science agents’ abilities, outperforming existing models like GPT-4o on standardized benchmarks.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2601.16344

6. GameTalk: Training LLMs for Strategic Conversation

๐Ÿ”‘ Keywords: GameTalk, large language models, multi-turn interactions, reward shaping, fine-tuning methods

๐Ÿ’ก Category: Natural Language Processing

๐ŸŒŸ Research Objective:

– To train large language models (LLMs) for strategic decision-making through multi-turn dialogue by optimizing global objectives across complete conversations.

๐Ÿ› ๏ธ Research Methods:

– The adaptation of fine-tuning methods like GRPO, DPO, and STaR to incorporate reward signals dependent on the full interaction.

– Evaluation on a set of complex games focusing on reasoning, coordination, and opponent modeling.

๐Ÿ’ฌ Research Conclusions:

– GameTalk outperforms untrained models in complex game scenarios, with DPO showing the most significant improvements, highlighting conversational fine-tuning as a promising approach for LLMs in interactive environments.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2601.16276

7. Endless Terminals: Scaling RL Environments for Terminal Agents

๐Ÿ”‘ Keywords: Endless Terminals, reinforcement learning, autonomous pipeline, procedural generation, PPO

๐Ÿ’ก Category: Reinforcement Learning

๐ŸŒŸ Research Objective:

– Develop an autonomous pipeline, Endless Terminals, to generate procedural terminal tasks that enhance agent performance on various benchmarks.

๐Ÿ› ๏ธ Research Methods:

– Implement a four-stage pipeline to generate tasks, build environments, and train agents using PPO with binary rewards and minimal interaction loops.

๐Ÿ’ฌ Research Conclusions:

– Models trained on Endless Terminals demonstrate substantial performance improvements on both synthetic and human-curated benchmarks, outperforming models with more complex structures.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2601.16443

8. Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation

๐Ÿ”‘ Keywords: Parametric Skill Transfer, Large Language Models, Supervised Fine-Tuning, Skill Vector, Knowledge Adaptation

๐Ÿ’ก Category: Natural Language Processing

๐ŸŒŸ Research Objective:

– The research introduces Parametric Skill Transfer (PaST) to facilitate efficient knowledge adaptation in large language models.

๐Ÿ› ๏ธ Research Methods:

– The framework combines supervised fine-tuning with skill vector injection to achieve modular skill transfer for knowledge adaptation.

๐Ÿ’ฌ Research Conclusions:

– PaST shows better performance in question answering and tool-use tasks, outperforming existing methods by significant margins in experiments on SQuAD, LooGLE, and ToolBench benchmarks.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2601.11258

9. Guidelines to Prompt Large Language Models for Code Generation: An Empirical Characterization

๐Ÿ”‘ Keywords: Prompt Optimization, Code Generation, Software Engineering, Large Language Models, Prompt Engineering

๐Ÿ’ก Category: AI Systems and Tools

๐ŸŒŸ Research Objective:

– To derive and evaluate prompt optimization guidelines for code generation tasks in software engineering.

๐Ÿ› ๏ธ Research Methods:

– An iterative, test-driven approach was used to refine code generation prompts, analyzing outcomes to identify improvement patterns.

๐Ÿ’ฌ Research Conclusions:

– Ten specific guidelines were identified for improving prompts, focusing on better I/O specification, conditions, examples, and clarity. An assessment with practitioners revealed insights into usage and perceived usefulness, influencing the development of LLM-aided software tools.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2601.13118

10.

๐Ÿ‘‰ Paper link: 

11. VISTA-PATH: An interactive foundation model for pathology image segmentation and quantitative analysis in computational pathology

๐Ÿ”‘ Keywords: VISTA-PATH, semantic segmentation, histopathology image, foundation models, human-in-the-loop

๐Ÿ’ก Category: AI in Healthcare

๐ŸŒŸ Research Objective:

– Introduce VISTA-PATH, an interactive pathology segmentation model designed for precise multi-class segmentation and clinical interpretation in digital pathology.

๐Ÿ› ๏ธ Research Methods:

– Utilize visual context, semantic descriptions, expert feedback, and optional spatial prompts to enable accurate segmentation across heterogeneous pathology images.

– Curate a large-scale dataset, VISTA-PATH Data, comprising over 1.6 million image-mask-text triplets, spanning 9 organs and 93 tissue classes.

๐Ÿ’ฌ Research Conclusions:

– VISTA-PATH consistently outperforms existing segmentation models.

– Supports dynamic refinement with human-in-the-loop feedback, improving tissue microenvironment analysis and producing strong correlations with patient survival using the Tumor Interaction Score.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2601.16451

12. Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind

๐Ÿ”‘ Keywords: Theory of Mind, AI-Generated Summary, Supervised Fine-Tuning, Reinforcement Learning, RebuttalAgent

๐Ÿ’ก Category: Natural Language Processing

๐ŸŒŸ Research Objective:

– The objective of the research is to introduce RebuttalAgent, the first framework to apply Theory of Mind to academic rebuttals, transforming them into a strategic communication process.

๐Ÿ› ๏ธ Research Methods:

– The methods involve a ToM-Strategy-Response pipeline utilizing supervised fine-tuning for initial training, followed by reinforcement learning with a self-reward mechanism to enable efficient automated evaluation.

๐Ÿ’ฌ Research Conclusions:

– The study concludes that RebuttalAgent significantly outperforms the base model by an average of 18.3% on automated metrics and surpasses advanced proprietary models in both automated and human evaluations.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2601.15715

13. ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch

๐Ÿ”‘ Keywords: ChartVerse, Vision Language Models, Rollout Posterior Entropy, answer-first paradigm, Chain-of-Thought

๐Ÿ’ก Category: Multi-Modal Learning

๐ŸŒŸ Research Objective:

– The study aims to enhance Vision Language Model (VLM) performance by creating a scalable framework, ChartVerse, to generate complex charts and reliable reasoning data.

๐Ÿ› ๏ธ Research Methods:

– Introduces the Rollout Posterior Entropy metric to assess chart complexity and utilizes a complexity-aware chart coder to synthesize diverse charts.

– Implements truth-anchored inverse QA synthesis with an answer-first paradigm to ensure accurate and consistent reasoning.

๐Ÿ’ฌ Research Conclusions:

– ChartVerse-8B achieves state-of-the-art performance, excelling its teacher model and comparing favorably to stronger alternatives.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2601.13606

14. Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain

๐Ÿ”‘ Keywords: Turkish Legal Domain, Domain Adaptation, Encoder Models, Continual Pre-training, Retrieval Performance

๐Ÿ’ก Category: Natural Language Processing

๐ŸŒŸ Research Objective:

– Develop specialized Turkish legal language models through domain adaptation strategies to improve legal text processing.

๐Ÿ› ๏ธ Research Methods:

– Utilize ModernBERT-based bidirectional encoder models pre-trained on a large Turkish corpus.

– Implement a checkpoint selection strategy to optimize retrieval performance.

– Apply continual pre-training with controlled curriculum learning for decoder models.

๐Ÿ’ฌ Research Conclusions:

– Achieved top-3 rankings on the Turkish retrieval leaderboard with efficient models.

– Demonstrated cost-effective single-stage pre-training approach with high production efficiency.

– Realized a 36.2% perplexity reduction in Turkish legal text through domain adaptation.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2601.16018

15. MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences

๐Ÿ”‘ Keywords: Human-AI collaboration, constructive critique, player experience, creative co-designers, Mechanics-Dynamics-Aesthetics

๐Ÿ’ก Category: Human-AI Interaction

๐ŸŒŸ Research Objective:

– The study aims to enhance Human-AI collaboration in board game design by offering constructive critique that aligns with player experiences, bridging the gap in current AI systems.

๐Ÿ› ๏ธ Research Methods:

– The research involved curating a dataset of 1,727 rulebooks and 150K reviews, using Mechanics-Dynamics-Aesthetics (MDA) reasoning to bridge written rules and player experience, and developing MeepleLM to simulate feedback from diverse player archetypes.

๐Ÿ’ฌ Research Conclusions:

– MeepleLM outperforms existing commercial models in community alignment and critique quality, serving as a reliable virtual playtester, and representing a key advancement in audience-aligned Human-AI collaboration.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2601.07251

16. SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer

๐Ÿ”‘ Keywords: Diffusion Transformers, video generation, sparse attention, linear attention, input-dependent gating mechanism

๐Ÿ’ก Category: Generative Models

๐ŸŒŸ Research Objective:

– Enhance Diffusion Transformers for video generation to improve sparsity and speed while maintaining quality.

๐Ÿ› ๏ธ Research Methods:

– Introduced SALAD, a method combining linear and sparse attention branches with an input-dependent gating mechanism for balancing.

๐Ÿ’ฌ Research Conclusions:

– Achieved 90% sparsity and 1.72x inference speedup, maintaining generation quality comparable to full attention, with efficient finetuning using minimal data.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2601.16515

17. Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification

๐Ÿ”‘ Keywords: Deep Research Agents, self-evolving, rubric-based feedback, verification, inference-time scaling

๐Ÿ’ก Category: AI Systems and Tools

๐ŸŒŸ Research Objective:

– To propose a self-evolving framework for Deep Research Agents that enhances performance through iterative verification and rubric-based feedback during inference.

๐Ÿ› ๏ธ Research Methods:

– Introduction of DeepVerifier, a rubric-based outcome reward verifier, which integrates as a plug-and-play module during test-time inference.

– Development of DRA Failure Taxonomy to classify agent failures systematically, informing rubric design.

๐Ÿ’ฌ Research Conclusions:

– This novel approach enables agents to self-improve by iteratively refining responses, achieving an F1 score improvement of 12%-48% compared to baseline methods.

– The implementation of test-time scaling achieves accuracy gains of 8%-11% on challenging datasets, facilitating agent self-evolution without additional training.

– Release of DeepVerifier-4K, a curated dataset, to support open-source advancements in robust verification capabilities.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2601.15808

18. VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents

๐Ÿ”‘ Keywords: Vision-Language Models, multi-step visual interactions, perception-memory-action integration, symbolic puzzles, feedback

๐Ÿ’ก Category: Multi-Modal Learning

๐ŸŒŸ Research Objective:

– To address the challenges faced by Vision-Language Models (VLMs) in multi-step visual interaction tasks, especially in integrating perception, memory, and action over long horizons.

๐Ÿ› ๏ธ Research Methods:

– Introduced VisGym, a gymnasium of 17 environments to evaluate and train VLMs, spanning tasks like symbolic puzzles, real-image understanding, and navigation with adjustable difficulty and feedback mechanisms.

๐Ÿ’ฌ Research Conclusions:

– Frontier models struggle in interactive environments, with lower success rates in complex tasks; models perform worse with unbounded histories than with truncated windows, necessitating new methods like goal observations and textual feedback to improve outcomes.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2601.16973

19. SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents

๐Ÿ”‘ Keywords: SWE-Pruner, Task-aware pruning, Neural skimmer, Token reduction

๐Ÿ’ก Category: AI Systems and Tools

๐ŸŒŸ Research Objective:

– Develop a self-adaptive context pruning framework called SWE-Pruner to enhance coding agents by reducing token usage while preserving performance.

๐Ÿ› ๏ธ Research Methods:

– Implement task-aware adaptive pruning using a lightweight neural skimmer with 0.6B parameters, designed to select relevant lines based on an explicit goal.

๐Ÿ’ฌ Research Conclusions:

– SWE-Pruner achieved substantial token reduction (23-54%) in agent tasks and up to 14.84x compression in single-turn tasks, with minimal impact on performance.

๐Ÿ‘‰ Paper link: https://huggingface.co/papers/2601.16746

Blank Form (#4)
[email protected]

About

Ecosystem

Copyright 2026 AI Native Foundationยฉ . All rights reserved.โ€‹