<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Papers &#8211; AI Native Foundation</title>
	<atom:link href="https://ainativefoundation.org/category/insights/papers/feed/" rel="self" type="application/rss+xml" />
	<link>https://ainativefoundation.org</link>
	<description></description>
	<lastBuildDate>Thu, 21 May 2026 00:40:09 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=7.0</generator>

<image>
	<url>https://ainativefoundation.org/wp-content/uploads/2024/05/cropped-favicon-32x32.png</url>
	<title>Papers &#8211; AI Native Foundation</title>
	<link>https://ainativefoundation.org</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>AI Native Daily Paper Digest &#8211; 20260520</title>
		<link>https://ainativefoundation.org/ai-native-daily-paper-digest-20260520/</link>
		
		<dc:creator><![CDATA[insights]]></dc:creator>
		<pubDate>Thu, 21 May 2026 00:40:09 +0000</pubDate>
				<category><![CDATA[Papers]]></category>
		<guid isPermaLink="false">https://ainativefoundation.org/ai-native-daily-paper-digest-20260520/</guid>

					<description><![CDATA[1. When Vision Speaks for Sound 🔑 Keywords: Audio-Visual Clever Hans effect, intervention-driven probing framework, audio-visual alignment, video-capable MLLMs, counterfactual audio edits [&#8230;]]]></description>
										<content:encoded><![CDATA[<h3>1. When Vision Speaks for Sound</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Audio-Visual Clever Hans effect, intervention-driven probing framework, audio-visual alignment, video-capable MLLMs, counterfactual audio edits</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to diagnose and improve the audio-visual alignment in video-capable multimodal large language models (MLLMs), specifically identifying the reliance on visual cues for audio understanding.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduced Thud, an intervention-driven probing framework, utilizing counterfactual audio edits: Shift (temporal synchronization), Mute (sound existence), and Swap (audio-visual consistency) to study audio-visual alignment failures.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; A two-stage alignment recipe was proposed, showing a 28 percentage point improvement in addressing intervention dimensions, with slight advancements in general video and audio-visual QA benchmarks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.16403" target="_blank">https://huggingface.co/papers/2605.16403</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img fetchpriority="high" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260520233014121.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>2. Active Learners as Efficient PRP Rerankers</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Pairwise Ranking Prompting, active learning, noisy pairwise comparisons, call budget, position bias</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Reformulate pairwise ranking prompting as active learning from noisy comparisons to enhance ranking quality and address position bias.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduce active rankers as replacements to improve NDCG@10 within call constraints and utilize a randomized oracle to mitigate position bias.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The framework enables unbiased ranking through a noise-robust approach, optimizing rankings without incurring the cost of bidirectional calls.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.14236" target="_blank">https://huggingface.co/papers/2605.14236</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260520233027236.png"></figure>
</p>
</div>
<div style='height:30px'></div>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>AI Native Daily Paper Digest &#8211; 20260518</title>
		<link>https://ainativefoundation.org/ai-native-daily-paper-digest-20260518/</link>
		
		<dc:creator><![CDATA[insights]]></dc:creator>
		<pubDate>Tue, 19 May 2026 00:41:24 +0000</pubDate>
				<category><![CDATA[Papers]]></category>
		<guid isPermaLink="false">https://ainativefoundation.org/ai-native-daily-paper-digest-20260518/</guid>

					<description><![CDATA[1. CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence 🔑 Keywords: CiteVQA, Attribution Hallucination, Document Vision-Language Models, Doc-VQA 💡 Category: Multi-Modal Learning [&#8230;]]]></description>
										<content:encoded><![CDATA[<h3>1. CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: CiteVQA, Attribution Hallucination, Document Vision-Language Models, Doc-VQA</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce CiteVQA to evaluate document vision-language models, ensuring both answer accuracy and correct citation of evidence.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Develop a benchmark comprising 1,897 questions across diverse documents, using automated pipelines and expert review for validation.</p>
<p>   &#8211; Evaluate models using Strict Attributed Accuracy to assess the reliability of answers with correct source citation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Current models exhibit notable attribution hallucinations, providing accurate answers but citing incorrect sources.</p>
<p>   &#8211; The best-performing model achieves a Strict Attributed Accuracy of only 76.0, highlighting a significant gap in model reliability.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.12882" target="_blank">https://huggingface.co/papers/2605.12882</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233015385.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>2. MMSkills: Towards Multimodal Skills for General Visual Agents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Multimodal procedural knowledge, Visual agents, Reusable skills, Decision making, MMSkills</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to enhance visual agents&#8217; decision-making capabilities by developing MMSkills, which leverage external reusable skills through a structured multimodal procedural knowledge framework.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The introduction of a framework that represents, generates, and utilizes multimodal procedures, utilizing a trajectory-to-skill generator to convert public non-evaluation trajectories into usable multimodal skills.</p>
<p>   &#8211; Implementation of a branch-loaded multimodal skill agent for inspecting and aligning state cards and keyframes with the live environment.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; MMSkills improve both frontier and smaller visual agents in GUI and game-based benchmarks, demonstrating the effectiveness of integrating external multimodal procedural knowledge with model-internal priors.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.13527" target="_blank">https://huggingface.co/papers/2605.13527</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233046322.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>3. Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: On-policy distillation, parameter-level mechanisms, update trajectory, plug-and-play acceleration, EffOPD</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to explore the parameter-level mechanisms that make On-policy distillation (OPD) efficient and to introduce a method for accelerating OPD training in large language models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The research argues that OPD&#8217;s efficiency comes from establishing a stable update trajectory early in training.</p>
<p>   &#8211; EffOPD, a plug-and-play acceleration method, is proposed which selects an extrapolation step size without additional modules or complex tuning.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; EffOPD achieves a 3x speedup in training while maintaining comparable performance, offering insights into efficient post-training for large models.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.11739" target="_blank">https://huggingface.co/papers/2605.11739</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233121621.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>4. Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: CoRD, collaborative multi-teacher decoding, reasoning trajectories, predictive perplexity scoring, beam search</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Knowledge Representation and Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce CoRD, a framework to enhance reasoning through collaborative multi-teacher decoding and step-wise reasoning synthesis.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilized predictive perplexity-based scoring and beam search for constructing reasoning trajectories.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; CoRD improves reasoning data quality and student performance, demonstrating well generalization across various settings with efficient supervision.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.02290" target="_blank">https://huggingface.co/papers/2605.02290</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233156606.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>5. Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Flash-GRPO, training efficiency, video diffusion models, iso-temporal grouping, temporal gradient rectification</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The aim is to enhance the training efficiency of video diffusion models by resolving temporal variance and gradient inconsistency.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implemented Flash-GRPO, a single-step training framework utilizing iso-temporal grouping to maintain temporal consistency and temporal gradient rectification to manage gradient magnitudes.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Flash-GRPO significantly accelerates training and achieves state-of-the-art alignment quality without compromising stability, effectively supporting models ranging from 1.3B to 14B parameters.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15980" target="_blank">https://huggingface.co/papers/2605.15980</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233231592.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>6. ReactiveGWM: Steering NPC in Reactive Game World Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: ReactiveGWM, NPC behaviors, cross-attention modules, game-agnostic representation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To introduce ReactiveGWM, which enables dynamic interactions between players and NPCs in game worlds by decoupling player controls from NPC behaviors using diffusion models with cross-attention modules.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilized cross-attention modules and a diffusion backbone to achieve a game-agnostic representation of interactive logic, facilitating zero-shot strategy transfer to various game world models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; ReactiveGWM demonstrates the ability to maintain detailed player control and robust NPC strategy adherence, allowing for scalable and strategy-rich interactions without domain-specific retraining, as evidenced in evaluations on Street Fighter games.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15256" target="_blank">https://huggingface.co/papers/2605.15256</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260518233305933.mp4"></video> </figure>
</p>
</div>
<div style='height:30px'></div>
<h3>7. Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Solvita, Continuous Learning, Reinforcement Learning, Program Synthesis, Multi-agent Frameworks</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce Solvita, an agentic evolution framework, that accomplishes state-of-the-art performance in continuous learning for code generation through reinforcement learning applied to knowledge networks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implement a closed-loop system with four specialized agents, each paired with a graph-structured knowledge network, to address the stateless nature of current frameworks and enable dynamic learning via reinforcement learning updates.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Solvita significantly outperforms existing multi-agent code-generation systems and improves accuracy substantially compared to single-pass baselines in diverse competitive programming environments.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15301" target="_blank">https://huggingface.co/papers/2605.15301</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233341964.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>8. CM-EVS: Sparse Panoramic RGB-D-Pose Data for Complete Scene Coverage</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: 3D visual learning, panoramic RGB-D-pose data, ERP viewpoint curator, geometry-consistent</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to convert 3D assets into sparse panoramic RGB-D-pose data that ensures complete scene coverage with minimal redundancy and auditable provenance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The authors propose COVER, a training-free ERP viewpoint curator that projects observed geometry into candidate probes, scores coverage, and penalizes depth conflicts.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Using COVER, the authors developed CM-EVS, a panoramic RGB-D-pose dataset that demonstrates improved trade-off between coverage and conflict, providing a sparse, compact, and auditable resource for geometry-consistent panoramic 3D learning.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15597" target="_blank">https://huggingface.co/papers/2605.15597</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233412967.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>9. Unlocking Dense Metric Depth Estimation in VLMs</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Vision-Language Models, dense geometry, depth head, vision-text supervision, 3D spatial reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; DepthVLM aims to enhance Vision-Language Models&#8217; capabilities in 3D spatial reasoning by adding dense geometry prediction while maintaining multimodal capabilities.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study introduces a lightweight depth head added to the LLM backbone and uses a unified vision-text supervision paradigm with a two-stage training schedule to generate full-resolution depth maps.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; DepthVLM outperforms existing VLMs in inference efficiency, surpasses leading vision models, and improves complex 3D spatial reasoning, indicating progress towards a unified foundation model.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15876" target="_blank">https://huggingface.co/papers/2605.15876</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233444422.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>10. Steered LLM Activations are Non-Surjective</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Activation steering, white-box control, interpretability, safety research, surjectivity</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To explore whether activation steering in language models can be replicated by standard textual prompts and establish a distinction between white-box and black-box control methods.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study involves casting the capability of steered behavior realization as a surjectivity problem, with both theoretical proofs and empirical illustrations across three widely used LLMs.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; It was concluded that activation steering pushes the model&#8217;s residual streams off the state manifold achievable by discrete textual prompts, highlighting a formal separation between white-box steerability and black-box prompting.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.09839" target="_blank">https://huggingface.co/papers/2604.09839</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233515840.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>11. Efficient Image Synthesis with Sphere Latent Encoder</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Few-step image generation, Latent denoising model, Pixel space, Image encoder, AI-generated summary</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to improve efficiency and performance in few-step image generation by separating pixel-space operations from latent denoising training.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; A decoupled framework is implemented, featuring a fixed pretrained image encoder and a separate latent denoising model trained in a spherical latent space. This approach eliminates repeated pixel-space operations during training and inference.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed method outperforms existing models like Sphere Encoder on datasets such as Animal-Faces, Oxford-Flowers, and ImageNet-1K in both image generation quality and inference speed, while maintaining competitiveness with few-step and multi-step baselines.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15592" target="_blank">https://huggingface.co/papers/2605.15592</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233551640.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>12. FFAvatar: Few-Shot, Feed-Forward, and Generalizable Avatar Reconstruction</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: FFAvatar, 3D head avatar reconstruction, FLAME parameters, Multi-View Query-Former, real-time deployment</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce FFAvatar, a feed-forward framework for high-quality 3D head avatar reconstruction from few unposed images.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilization of Multi-View Query-Former to fuse information from several images.</p>
<p>   &#8211; End-to-end FLAME parameter prediction directly from pixels.</p>
<p>   &#8211; Implementation of a three-stage training curriculum including scalable pretraining, multi-view fine-tuning, and optional personalization.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; FFAvatar outperforms existing models, achieving a substantial performance gain on the NeRSemble benchmark.</p>
<p>   &#8211; It enables rapid avatar reconstruction and supports real-time animation on a single NVIDIA A100 GPU.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15320" target="_blank">https://huggingface.co/papers/2605.15320</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233624623.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>13. WorldAct: Activating Monolithic 3D Worlds into Interactive-Ready Object-Centric Scenes</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: WorldAct framework, 3D environments, multimodal agents, geometric reconstruction</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper introduces WorldAct, a framework designed to transform static 3D generated environments into editable and interactive scenes, enhancing their utility in immersive content creation and embodied simulation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes a multimodal agent to perform scene decomposition, identify actionable objects, reconstruct geometrically aligned object-level meshes, and apply 3D inpainting to restore backgrounds.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; WorldAct enhances interaction scenarios by enabling object-level editing, collision-aware manipulation, and embodied task execution while maintaining global scene coherence. This suggests a practical step towards developing editable and interactive 3D world models.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15843" target="_blank">https://huggingface.co/papers/2605.15843</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260518233701033.mp4"></video> </figure>
</p>
</div>
<div style='height:30px'></div>
<h3>14. Look Before You Leap: Autonomous Exploration for LLM Agents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: autonomous exploration, Exploration Checkpoint Coverage, reinforcement learning, Explore-then-Act paradigm, interaction budget</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to enhance agent adaptability by introducing a focus on autonomous exploration capabilities, addressing premature exploitation issues in large language model-based agents.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; A novel metric called Exploration Checkpoint Coverage is introduced to measure exploration breadth, and a new training strategy that integrates task-execution and exploration rollouts with optimized verifiable rewards is developed.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The study concludes that systematic exploration training is essential for developing agents that are generalizable and effective in real-world environments, proposing the Explore-then-Act paradigm to improve overall agent performance.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.16143" target="_blank">https://huggingface.co/papers/2605.16143</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233751585.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>15. ChangeFlow &#8212; Latent Rectified Flow for Change Detection in Remote Sensing</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Change Detection, Change Mask, Generative Formulation, Latent Space, Rectified Flow</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study proposes ChangeFlow, a generative framework for remote sensing change detection, aiming to improve accuracy and robustness through synthesis of change masks in latent space.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes a structured yet lightweight conditioning signal and a stochastic design to support sampling-based prediction ensembling, allowing aggregation of multiple predicted change masks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; ChangeFlow enhances the robustness of change detection models, achieving an average F1 score of 80.4%, outperforming previous methods by 1.3 points on average, while maintaining competitive inference speed.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15375" target="_blank">https://huggingface.co/papers/2605.15375</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233858836.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>16. Learning POMDP World Models from Observations with Language-Model Priors</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Pinductor, POMDP, LLM, Sample Efficiency, World-Model Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Investigate if language-model priors can reduce costly interactions in learning POMDP models from limited observation-action data.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduce Pinductor, which uses an LLM to propose and iteratively refine POMDP models based on belief-based likelihood scores from minimal observation-action trajectories.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Pinductor matches the performance of methods with privileged hidden state access and significantly exceeds the sample efficiency of traditional tabular approaches, establishing language-model priors as a practical tool for efficient world-model learning in partially observable environments.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.13740" target="_blank">https://huggingface.co/papers/2605.13740</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233824853.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>17. Raster2Seq: Polygon Sequence Generation for Floorplan Reconstruction</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: sequence-to-sequence modeling, autoregressive decoder, floorplan reconstruction, learnable anchors, attention mechanism</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to reconstruct structured vector graphics from rasterized floorplan images using a sequence-to-sequence paradigm to accurately preserve the geometry and semantics of complex floorplans.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The proposed method employs an autoregressive decoder, which predicts polygon corners based on image features and prior corners, utilizing learnable anchors representing spatial coordinates to guide attention mechanisms.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed Raster2Seq method achieves state-of-the-art performance on benchmarks like Structure3D and Raster2Graph, and generalizes well to challenging datasets such as WAFFLE with complex geometric variations.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2602.09016" target="_blank">https://huggingface.co/papers/2602.09016</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518234046609.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>18. Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Multimodal-Physics Evaluation, Vision-Language Reasoning, Train-Eval Contamination, Translation Drift, MCQ Saturation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to identify and document previously undetected issues in multimodal-physics evaluations that distort vision-language reasoning measurements.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The research utilizes a comprehensive, multi-stage auditing process, including Jaccard, mxbai-embed-large cosine, and Haiku-4.5 LLM-judge audits, to reveal near-duplicates and paraphrase candidates, as well as evaluate translations and response formats.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Significant distortions exist in current evaluation practices due to train-eval contamination, translation drift, and MCQ saturation. New artifacts released address these gaps and demonstrate improved outcomes in multimodal-physics reasoning tasks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.14040" target="_blank">https://huggingface.co/papers/2605.14040</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518234010845.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>19. GQLA: Group-Query Latent Attention for Hardware-Adaptive Large Language Model Decoding</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Group-Query Latent Attention, Multi-head Latent Attention, Efficient Inference, AI Native, Tensor Parallelism</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research introduces Group-Query Latent Attention (GQLA), which enables efficient inference on multiple hardware without the need for retraining by exposing multiple decoding paths from a single set of trained weights.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study employs a minimal modification of Multi-head Latent Attention (MLA) to create GQLA with two algebraically equivalent decoding paths suitable for high-performance and commodity GPUs.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; GQLA&#8217;s approach allows for adaptability to different target hardware without retraining or custom kernels, offering significant efficiency improvements by supporting zero-redundancy tensor parallelism and improved per-token KV cache compression.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15250" target="_blank">https://huggingface.co/papers/2605.15250</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233933179.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>20. Known By Their Actions: Fingerprinting LLM Browser Agents via UI Traces</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: LLM-based agents, model vulnerabilities, interaction timings, passive JavaScript tracker, randomised timing delays</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To determine if websites can passively identify the large language model (LLM) powering web browsing agents using behavioral patterns and timing data.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study involves 14 frontier LLMs across four web environments, utilizing a passive JavaScript tracker to capture agent actions and interaction timings. Classifiers were trained on these actions to generalize across model sizes and families.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Passive identification of underlying models in web browsing agents is highly accurate (up to 96% F1). Classifier performance significantly degrades with randomised timing delays, but can largely recover when retrained, indicating a potential security risk regarding targeted attacks on model vulnerabilities.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.14786" target="_blank">https://huggingface.co/papers/2605.14786</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518234128229.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>21. MLAIRE: Multilingual Language-Aware Information Retrieval Evaluation Protocal</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Multilingual Information Retrieval, Semantic Retrieval, Query-Language Preference, Language-Aware Metrics, Retrieval-Augmented Generation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study introduces MLAIRE, an evaluation protocol designed to enhance multilingual information retrieval by separating semantic retrieval accuracy from query-language preference.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; MLAIRE controls pools with parallel passages in different languages, measuring both semantic retrieval accuracy and query-language preference using new language-aware metrics like Language Preference Rate (LPR) and Lang-nDCG.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Standard retrieval metrics often obscure important differences: while some retrievers excel in semantic retrieval, they might return results in a non-query language; others may favor query-language preference but retrieve less relevant content.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.07249" target="_blank">https://huggingface.co/papers/2605.07249</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518234213431.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>22. </h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="" target="_blank"></a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/thirteen/202605181779147765.jpg"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>23. Stress-Testing the Reasoning Competence of LLMs With Proofs Under Minimal Formalism</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: ProofGrid, machine-checkable proofs, reasoning depth, epistemic instability, proof synthesis</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Knowledge Representation and Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To introduce ProofGrid, a benchmark suite for evaluating Large Language Model (LLM) reasoning using machine-checkable proofs to assess reasoning depth and stability.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilization of tasks in proof writing, proof checking, proof masking, and proof gap-filling with minimal formal notation, particularly using Natural Deduction Language (NDL).</p>
<p>   &#8211; Development of an instrumented proof-checking pipeline that enhances measurement resolution by locating substantive reasoning failures.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Results indicate frontier models show proficiency in foundational tasks but struggle with complex tasks requiring global combinatorial reasoning or low-level proof synthesis.</p>
<p>   &#8211; Identification of epistemic instability, where models produce flawed proofs yet correctly reject isolated local inferences, formalized with an Epistemic Stability Index.</p>
<p>   &#8211; Complementary analyses using 2PL IRT analyses, Wright maps, and a normalized task-discrimination measure based on Fisher information.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.12524" target="_blank">https://huggingface.co/papers/2605.12524</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518234235085.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>24. No One Knows the State of the Art in Geospatial Foundation Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Geospatial foundation models, Evaluation, Standardization, Model weights, Pretraining controls</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Foundations of AI</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to address the lack of standardized evaluation and reporting in Geospatial Foundation Models (GFMs), which affects performance comparison and reproducibility.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; An audit of 152 papers revealing discrepancies in evaluations and protocols across different studies.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The authors propose six concrete steps, including named-license weight release and shared core evaluations, to improve standardization and foster innovation in GFMs.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.12678" target="_blank">https://huggingface.co/papers/2605.12678</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518234151819.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>25. AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AuralFuser, cross-modal influence, promptable segmentation, audio-guided contrastive loss</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To integrate audio into the Segment Anything Model 2 (SAM2) using the AuralFuser module to enhance cross-modal influence while preserving segmentation efficiency.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Developed AuralFuser to fuse audio and visual features, generating sparse and dense prompts guided by audio within SAM2&#8217;s feature pyramid.</p>
<p>   &#8211; Introduced an audio-guided contrastive loss for better alignment of auditory and visual modalities.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Achieved notable accuracy improvements on public benchmarks with minimal impact on the interactive efficiency of promptable segmentation.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2506.01015" target="_blank">https://huggingface.co/papers/2506.01015</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518234109543.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>26. Follow the Mean: Reference-Guided Flow Matching</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Flow Matching, Controllable Generation, Reference-Mean Guidance, Semi-Parametric Guidance, AI-Generated Summary</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to demonstrate that flow matching enables controllable generation through example-based adaptation, providing an alternative to the traditional methods of fine-tuning and auxiliary networks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study employs two methods for controllable generation: Reference-Mean Guidance, which is training-free and applies a closed-form endpoint-mean correction to a pre-trained model, and Semi-Parametric Guidance, which uses a learned residual refiner to match model quality while allowing changes at inference time.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The findings suggest a paradigm shift towards generative models that adapt through data rather than parameter updates, offering a new control interface that relies on modifying the reference set rather than model weights.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10302" target="_blank">https://huggingface.co/papers/2605.10302</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518234028183.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>27. Forgetting That Sticks: Quantization-Permanent Unlearning via Circuit Attribution</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Quantization, Machine Unlearning, MANSU, Causal Circuit Attribution, Sparsity-Permanence Tradeoff</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper investigates how quantization affects machine unlearning and introduces the concept of a sparsity-permanence tradeoff.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The research employs MANSU, combining causal circuit attribution, circuit-restricted null-space projection, and other techniques to address the limitations presented by quantization.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; MANSU effectively resolves issues with preserving forgetting and retention post-quantization, distinguishing structural erasure from behavioral suppression, and is validated across various models.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15138" target="_blank">https://huggingface.co/papers/2605.15138</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233952977.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>28. OmniHumanoid: Streaming Cross-Embodiment Video Generation with Paired-Free Adaptation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Cross-embodiment video generation, Humanoid embodiments, Motion transfer, Embodiment-specific adaptation, Motion fidelity</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective of the research is to enable scalable adaptation of humanoid embodiments by factorizing motion transfer and embodiment-specific adaptation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Develop a framework called OmniHumanoid that learns a shared motion transfer model from motion-aligned paired videos across multiple embodiments and adapts to new ones using unpaired videos through lightweight embodiment-specific adapters.</p>
<p>   &#8211; Introduce a branch-isolated attention design to separate motion conditioning from embodiment-specific modulation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; OmniHumanoid achieves strong motion fidelity and embodiment consistency, enabling scalable adaptation to unseen humanoid embodiments without retraining the shared motion model.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.12038" target="_blank">https://huggingface.co/papers/2605.12038</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233915612.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>29. HodgeCover: Higher-Order Topological Coverage Drives Compression of Sparse Mixture-of-Experts</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Sparse Mixture-of-Experts, harmonic kernel, HodgeCover, learning-free compression, simplicial topology</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Foundations of AI</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper introduces a novel compression approach for Sparse Mixture-of-Experts layers using harmonic kernel analysis to optimize expert merging patterns, enabling efficient inference without retraining.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The method employs harmonic kernel analyses from simplicial topology and Hodge-decomposition of edge-barrier signals, combined with a hybrid variant of HodgeCover and weight pruning.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The approach successfully achieves state-of-the-art performance in aggressive expert reduction on Sparse MoE backbones, indicating that the harmonic kernel is pivotal in improving compressor effectiveness in key scenarios.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.13997" target="_blank">https://huggingface.co/papers/2605.13997</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233843122.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>30. Learning from Failures: Correction-Oriented Policy Optimization with Verifiable Rewards</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Correction-Oriented Policy Optimization, Reinforcement Learning with Verifiable Rewards, reasoning capabilities, error correction, failed trajectories</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study introduces Correction-Oriented Policy Optimization (CIPO) to enhance reinforcement learning by converting failed trajectories into correction supervision, thereby improving reasoning and error correction in large language models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; By integrating correction samples derived from the model&#8217;s own failed attempts with the standard Reinforcement Learning with Verifiable Rewards (RLVR) objective, CIPO refines learning effectiveness and error correction capabilities.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Extensive experiments on 11 benchmarks in mathematical reasoning and code generation demonstrate that CIPO significantly surpasses existing baselines in both reasoning and correction performance, enhancing intrinsic reasoning capacity rather than merely adjusting existing correct answer probabilities.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.14539" target="_blank">https://huggingface.co/papers/2605.14539</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233807520.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>31. Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AI agents, foundation models, Transformer-based, autonomous design, AIRA-Compose, AIRA-Design</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Foundations of AI</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to explore the autonomous design of foundation models that go beyond standard Transformers through a dual-framework approach, focusing on architectural search and mechanistic implementation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes a dual-framework: AIRA-Compose for high-level architecture search and AIRA-Design for low-level mechanistic implementation, involved 11 agents for architecture search and 20 agents for designing attention mechanisms.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The AI-designed architectures improved performance and efficiency, with AIRAformer-D and AIRAhybrid-D enhancing accuracy on downstream tasks and models such as AIRAformer-C scaling significantly faster. These frameworks demonstrate the potential for AI agents to discover architectures and optimizations that match or surpass human-designed baselines, paving the way toward recursive self-improvement.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15871" target="_blank">https://huggingface.co/papers/2605.15871</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233733631.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>32. MobileEgo Anywhere: Open Infrastructure for long horizon egocentric data on commodity hardware</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Vision Language Action, egocentric datasets, mobile hardware, smartphone sensors</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop MobileEgo Anywhere, a framework for collecting extensive egocentric robot data using smartphone sensors for the large-scale training of Vision Language Action models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilization of modern smartphone sensor suites for long-term camera pose tracking, releasing a novel dataset of 200 hours of egocentric data, and providing an open-source mobile application and processing pipeline.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The framework lowers hardware barriers, democratizes data collection, and enables large-scale acquisition of diverse egocentric data, fostering the accelerated development of generalizable robotic policies.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.05945" target="_blank">https://huggingface.co/papers/2605.05945</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260518233642520.mp4"></video> </figure>
</p>
</div>
<div style='height:30px'></div>
<h3>33. Sparse Autoencoders enable Robust and Interpretable Fine-tuning of CLIP models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: SAE-FT, vision-language models, fine-tuning, Sparse Autoencoder, distribution shifts</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop a novel method called SAE-FT for robust fine-tuning of vision-language models while improving robustness against distribution shifts.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilized sparse autoencoder constraints on visual representations to regularize changes, preventing the addition/removal of semantically significant features during fine-tuning.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; SAE-FT is computationally efficient and matches or exceeds state-of-the-art performance in ImageNet and distribution shift benchmarks, while maintaining interpretability and preventing catastrophic forgetting.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15961" target="_blank">https://huggingface.co/papers/2605.15961</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233608022.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>34. DiagnosticIQ: A Benchmark for LLM-Based Industrial Maintenance Action Recommendation from Symbolic Rules</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Large Language Models, AI-Generated Summary, Symbolic Rules, Failure Modes, Embedding-based Distractor Sampling</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to evaluate whether large language models (LLMs) can effectively translate industrial monitoring rules into maintenance actions, focusing on their potential as decision support systems in complex environments.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Development of a benchmark consisting of 6,690 expert-validated multiple-choice questions based on 118 rule-action pairs across 16 asset types.</p>
<p>   &#8211; Implementation of a symbolic-to-MCQA pipeline normalizing rules to Disjunctive Normal Form, alongside embedding-based distractor sampling.</p>
<p>   &#8211; Evaluation of 29 LLMs and 4 embedding baselines, probing different failure modes such as brittleness and pattern-matching through five variants.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The top-performing LLMs are competitively close, although the best shows a significant advantage according to the Bradley-Terry Elo ranking.</p>
<p>   &#8211; Models exhibit vulnerabilities, losing accuracy when presented with expanded distractors and revealing pattern-matching tendencies under condition inversion.</p>
<p>   &#8211; The study identifies calibration, rather than capability, as a bottleneck in deploying these models for fault detection in industrial applications.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.08614" target="_blank">https://huggingface.co/papers/2605.08614</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233535432.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>35. MetaAgent-X : Breaking the Ceiling of Automatic Multi-Agent Systems via End-to-End Reinforcement Learning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: MetaAgent-X, Automatic Multi-Agent Systems, End-to-End Training, Reinforcement Learning, Stagewise Co-evolution</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To introduce MetaAgent-X, an end-to-end reinforcement learning framework that optimizes automatic multi-agent systems design and execution.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilization of Executor Designer Hierarchical Rollout and Stagewise Co-evolution to improve training stability and reveal the dynamics of co-evolution between designer and executor.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; MetaAgent-X outperforms existing automatic MAS baselines with up to 21.7% gains, demonstrating that a stagewise co-evolution process is effective for building self-designing and self-executing agentic models.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.14212" target="_blank">https://huggingface.co/papers/2605.14212</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233458201.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>36. PAGER: Bridging the Semantic-Execution Gap in Point-Precise Geometric GUI Control</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AI Native, GUI agents, vision-language models, topology-aware agent, precision-sensitive tasks</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To improve precision-sensitive tasks for GUI agents through a topology-aware framework that enhances task success with structured planning and pixel-level execution.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduction of PAGE Bench with 4,906 problems and pixel-level GUI actions.</p>
<p>   &#8211; Development of PAGER, a topology-aware agent utilizing dependency-structured planning and precision-aligned reinforcement learning.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; PAGER significantly increases task success and step success rate compared to baseline models, establishing a new standard for point-precise GUI control.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15963" target="_blank">https://huggingface.co/papers/2605.15963</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233427685.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>37. From Plans to Pixels: Learning to Plan and Orchestrate for Open-Ended Image Editing</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: experiential framework, long-horizon image editing, reward-driven execution, coherence, reliability</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To propose a new experiential framework for enhancing coherence and reliability in long-horizon image editing tasks through a combination of planning and reward-driven execution.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The framework employs a planner for generating structured atomic decompositions, and an orchestrator to select tools and regions for executing tasks, facilitated by a vision language judge to provide outcome-based rewards.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; By integrating planning with reward-driven execution, the proposed approach demonstrates more coherent and reliable edits compared to existing single-step or rule-based multistep methods.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15181" target="_blank">https://huggingface.co/papers/2605.15181</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233356046.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>38. Hölder Policy Optimisation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Group Relative Policy Optimisation, Hölder mean, token-level probability aggregation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To enhance large language models by optimizing policy update mechanisms through a novel framework called HölderPO.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduced HölderPO framework leveraging the Hölder mean to unify token-level probability aggregation, with dynamic annealing to adjust parameter p for optimal trade-off management.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; HölderPO provides superior stability and convergence, achieving a state-of-the-art average accuracy of 54.9% on mathematical benchmarks and a 93.8% success rate on ALFWorld, outperforming standard GRPO with a 7.2% relative gain.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.12058" target="_blank">https://huggingface.co/papers/2605.12058</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233328075.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>39. Nudging Beyond the Comfort Zone: Efficient Strategy-Guided Exploration for RLVR</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Reinforcement Learning, Strategy Nudging, Verifiable Rewards, Exploration, Large Language Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To improve reasoning capabilities in large language models by enhancing reinforcement learning with verifiable rewards through the NudgeRL framework, which uses structured exploration and strategy nudging.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduce Strategy Nudging to condition each rollout on strategy-level contexts for diverse reasoning trajectories.</p>
<p>   &#8211; Propose a unified objective that decomposes reward signals and incorporates a distillation objective for transferring behaviors to the base policy.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; NudgeRL outperforms standard GRPO with larger rollout budgets and exceeds performance of oracle-guided RL baselines across multiple math benchmarks, demonstrating an efficient alternative to brute-force scaling and privileged information methods.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15726" target="_blank">https://huggingface.co/papers/2605.15726</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233248963.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>40. InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: InsightTok, discrete visual tokenization, perceptual losses, autoregressive image generation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper targets the improvement of text and face reconstruction in visual generation by addressing the limitations of standard discrete-tokenizer objectives.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; A novel framework called InsightTok is introduced, utilizing localized, content-aware perceptual losses, along with a compact 16k codebook and a 16x downsampling rate to enhance text and face fidelity.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; InsightTok significantly surpasses previous tokenizers in reconstructing text and face details without sacrificing general image reconstruction quality, demonstrating the benefits of specialized supervision in tokenizer training.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.14333" target="_blank">https://huggingface.co/papers/2605.14333</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233216773.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>41. DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: DexJoCo, dexterous manipulation, benchmark, toolkit, dexterous hands</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To establish a benchmark and toolkit, DexJoCo, for evaluating dexterous manipulation tasks, emphasizing tool-use, bimanual coordination, and long-horizon execution.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Development of a benchmark and toolkit with 11 functional tasks.</p>
<p>   &#8211; Implementation of a low-cost data collection system generating 1.1K task trajectories.</p>
<p>   &#8211; Application of domain randomization for robustness assessment, alongside diverse benchmarks including visual and dynamics randomization, multi-task training, and action-head adaptation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Identification of key challenges and common limitations in current dexterous manipulation policies, providing insights for future research directions in dexterous hand robot learning.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.16257" target="_blank">https://huggingface.co/papers/2605.16257</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260518233136403.mp4"></video> </figure>
</p>
</div>
<div style='height:30px'></div>
<h3>42. FashionChameleon: Towards Real-Time and Interactive Human-Garment Video Customization</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: FashionChameleon, motion coherence, AI-generated summary, real-time generation, Teacher Model</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to achieve real-time interactive multi-garment video customization while preserving motion coherence, specifically important for applications in e-commerce and content creation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The FashionChameleon framework employs three key techniques: </p>
<p>     1. A Teacher Model with In-Context Learning using single-garment video data to ensure coherence during garment switching.</p>
<p>     2. Streaming Distillation and In-Context Learning for consistency and efficiency.</p>
<p>     3. A Training-Free KV Cache Rescheduling method to allow seamless garment switching while maintaining motion coherence.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; FashionChameleon enables interactive customization and consistent long-video extrapolation, achieving real-time generation rates significantly faster than existing methods.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15824" target="_blank">https://huggingface.co/papers/2605.15824</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233106228.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>43. PhysBrain 1.0 Technical Report</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: PhysBrain 1.0, physical commonsense supervision, Vision-language-action models, embodied control tasks, language-sensitive adaptation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Leverage human egocentric video to generate physical commonsense supervision for improving vision-language-action models in embodied control tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Employ a data engine to convert large-scale human egocentric video into structured supervision by extracting scene elements, spatial dynamics, action execution, and depth-aware relations, and turn these into question-answer style supervision for training.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; PhysBrain 1.0 achieves state-of-the-art results across several benchmarks, showing strong performance, especially in out-of-domain scenarios, suggesting that scaling physical commonsense from human interaction video enhances multimodal understanding and robot action execution.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15298" target="_blank">https://huggingface.co/papers/2605.15298</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260518233031960.png"></figure>
</p>
</div>
<div style='height:30px'></div>
]]></content:encoded>
					
		
		<enclosure url="https://cdn.ainative.foundation/huggingface/20260518233305933.mp4" length="0" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/huggingface/20260518233701033.mp4" length="0" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/huggingface/20260518233642520.mp4" length="0" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/huggingface/20260518233136403.mp4" length="0" type="video/mp4" />

			</item>
		<item>
		<title>AI Native Daily Paper Digest &#8211; 20260515</title>
		<link>https://ainativefoundation.org/ai-native-daily-paper-digest-20260515/</link>
		
		<dc:creator><![CDATA[insights]]></dc:creator>
		<pubDate>Sat, 16 May 2026 00:41:35 +0000</pubDate>
				<category><![CDATA[Papers]]></category>
		<guid isPermaLink="false">https://ainativefoundation.org/ai-native-daily-paper-digest-20260515/</guid>

					<description><![CDATA[1. Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling 🔑 Keywords: Reasoning Models, Reverse-Perplexity Curriculum, Reinforcement Learning, International Mathematical Olympiad, International [&#8230;]]]></description>
										<content:encoded><![CDATA[<h3>1. Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Reasoning Models, Reverse-Perplexity Curriculum, Reinforcement Learning, International Mathematical Olympiad, International Physics Olympiad</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to transform post-trained reasoning models into high-performing, olympiad-level solvers capable of achieving gold-medal performance in mathematical and physics competitions.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilization of reverse-perplexity curriculum to develop rigorous proof-search and self-checking behaviors.</p>
<p>   &#8211; Implementation of a two-stage reinforcement learning pipeline, progressing from verifiable rewards to proof-level reinforcement learning.</p>
<p>   &#8211; Enhancement of solving capabilities using test-time scaling techniques.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The developed model, SU-01, successfully performs at a gold-medal level on various competitions, including IMO and IPhO, and shows strong generalization of scientific reasoning beyond mathematics and physics.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.13301" target="_blank">https://huggingface.co/papers/2605.13301</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233009235.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>2. Self-Distilled Agentic Reinforcement Learning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AI Native, Reinforcement Learning, On-Policy Self-Distillation, token-level guidance, sigmoid gate</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To enhance reinforcement learning for multi-turn agent training by integrating On-Policy Self-Distillation with a novel sigmoid gate mechanism.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Developed Self-Distilled Agentic Reinforcement Learning (SDAR) which utilizes a sigmoid gate to manage the transfer of token-level guidance from a teacher branch, prioritizing positive signals while minimizing negative teacher rejections.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; SDAR significantly improves performance across several domains such as ALFWorld, WebShop, and Search-QA, showing marked improvements over traditional GRPO and other RL&#8211;OPSD methods, by effectively managing the instability issues present in naive GRPO+OPSD approaches.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15155" target="_blank">https://huggingface.co/papers/2605.15155</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233038992.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>3. SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: SANA-WM, Hybrid Linear Attention, Dual-Branch Camera Control, Two-Stage Generation Pipeline, AI-generated summary</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To introduce SANA-WM, an efficient 2.6B-parameter world model capable of generating high-fidelity, 720p video with precise camera control, achieving industrial-level quality and efficiency.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilization of a hybrid attention mechanism combining frame-wise Gated DeltaNet and softmax attention for efficient long-context modeling.</p>
<p>   &#8211; Implementation of Dual-Branch Camera Control to adhere to precise 6-DoF trajectories.</p>
<p>   &#8211; Application of a Two-Stage Generation Pipeline with a long-video refiner for improved quality and consistency across sequences.</p>
<p>   &#8211; Employing a Robust Annotation Pipeline that extracts camera poses for accurate spatiotemporal action labels.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; SANA-WM demonstrates notable efficiency in data usage, training times, and hardware requirements compared to prior models, with stronger action-following accuracy and comparable visual quality at significantly higher throughput.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15178" target="_blank">https://huggingface.co/papers/2605.15178</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233115507.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>4. Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Darwin Family, evolutionary merging, gradient-free, reasoning performance, Architecture Mapper</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To evaluate whether reasoning performance in large language models can be improved without additional training using evolutionary merging techniques.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilized a 14-dimensional adaptive merge genome for fine-grained recombination.</p>
<p>   &#8211; Employed MRI-Trust Fusion to balance layer-importance signals with evolutionary search.</p>
<p>   &#8211; Developed an Architecture Mapper to enable cross-architecture breeding.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Darwin-27B-Opus achieved high performance on GPQA Diamond without gradient-based training, surpassing its fully trained counterparts.</p>
<p>   &#8211; Demonstrated consistent improvement over parent models across varying scales, supporting recursive multi-generation evolution.</p>
<p>   &#8211; Showed that training-free evolutionary merging can be a cost-effective alternative for reasoning-centric language models.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.14386" target="_blank">https://huggingface.co/papers/2605.14386</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233141851.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>5. STALE: Can LLM Agents Know When Their Memories Are No Longer Valid?</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Large Language Models, Personalized Memory, Implicit Conflict, State Resolution, State-aware Memory</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To evaluate the ability of large language models to update personalized memories and resolve implicit conflicts when presented with new evidence.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduced STALE, a benchmark consisting of 400 expert-validated conflict scenarios and a three-dimensional probing framework to test State Resolution, Premise Resistance, and Implicit Policy Adaptation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; There exists a notable gap between retrieving updated evidence and acting on it in large language models, with the best model achieving only 55.2% accuracy. A prototype called CUPMem demonstrated potential for enhancing robust memory by strengthening write-time revision through structured state consolidation and propagation-aware search.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06527" target="_blank">https://huggingface.co/papers/2605.06527</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233208669.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>6. Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Warp-as-History, camera-controlled video generation, zero-shot capability, positional encoding</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to develop Warp-as-History, enabling camera-controlled video generation without training or test-time optimization.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; This method transforms camera-induced warps into pseudo-history representations with target-frame positional alignment and visible-token selection.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The approach demonstrates a zero-shot capability of a video generation model to adhere to camera trajectories, enhanced by lightweight LoRA finetuning for better camera adherence and visual quality.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15182" target="_blank">https://huggingface.co/papers/2605.15182</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260515233233438.mp4"></video> </figure>
</p>
</div>
<div style='height:30px'></div>
<h3>7. PREPING: Building Agent Memory without Tasks</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: pre-task memory, synthetic practice, proposer-guided memory construction, procedural memory, cold-start gap</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To explore whether agents can build procedural memory using self-generated synthetic practice before encountering any task-specific experiences in new environments.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduction of Preping framework, which utilizes proposer-guided synthetic tasks, Solver execution, and Validator feedback to construct memory efficiently.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Preping improves agent performance in new environments compared to a no-memory baseline and is competitive with playbook-based methods, reducing deployment costs significantly by controlling feasibility, redundancy, and selective memory updates.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.13880" target="_blank">https://huggingface.co/papers/2605.13880</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233307169.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>8. Realiz3D: 3D Generation Made Photorealistic via Domain-Aware Learning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Realiz3D, domain gap, diffusion models, 3D-consistent, photorealistic</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to bridge the domain gap between synthetic renders and real images in 3D-consistent image generation through the Realiz3D framework.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The researchers developed a lightweight framework using diffusion models, residual adapters, and layer-specific denoising strategies to decouple visual domain from control signals.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Realiz3D effectively maintains realism in generated images while applying control signals, enhancing output consistency and photorealism across tasks like text-to-multiview generation and texturing.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.13852" target="_blank">https://huggingface.co/papers/2605.13852</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260515233336130.mp4"></video> </figure>
</p>
</div>
<div style='height:30px'></div>
<h3>9. Learning to Communicate Locally for Large-Scale Multi-Agent Pathfinding</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Multi-agent pathfinding, Reinforcement learning, Imitation learning, Feature sharing, Scalability</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop a scalable and efficient solution for Multi-agent pathfinding (MAPF) with enhanced coordination through a learnable communication module.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study introduces the Local Communication for Multi-agent Pathfinding (LC-MAPF) model, applying multi-round communication between neighboring agents using a pre-trained approach to solve the MAPF problem effectively.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; LC-MAPF demonstrates superior performance and improved agent coordination compared to existing learning-based MAPF solvers without sacrificing scalability, even in diverse unseen scenarios.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.07637" target="_blank">https://huggingface.co/papers/2605.07637</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233415299.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>10. Long Context Pre-Training with Lighthouse Attention</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Lighthouse Attention, causal transformers, sequence length, scaled dot-product attention, hierarchical attention</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To enable efficient training of causal transformers at long sequences by using Lighthouse Attention to reduce computational complexity while maintaining model performance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduced a hierarchical selection-based attention algorithm that wraps around ordinary scaled dot-product attention (SDPA), incorporating subquadratic hierarchical pre- and post-processing, a symmetrical compression strategy, and a two-stage training approach.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Preliminary experiments with Lighthouse Attention show faster total training time and lower final loss compared to full attention training, validating the effectiveness of the method.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06554" target="_blank">https://huggingface.co/papers/2605.06554</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233446085.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>11. IntentVLA: Short-Horizon Intent Modeling for Aliased Robot Manipulation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: IntentVLA, short-horizon intents, partial observability, history-conditioned, ambiguity-aware benchmark</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective is to enhance robot imitation learning stability through IntentVLA, a framework that encodes short-horizon intents from visual observations to address challenges of partial observability and ambiguous observations.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The researchers introduced a history-conditioned VLA framework and developed AliasBench, a 12-task benchmark, to isolate short-horizon observation aliasing across different test environments.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; IntentVLA successfully improves rollout stability and outperforms existing VLA baselines by effectively managing imitation learning in multimodal settings.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.14712" target="_blank">https://huggingface.co/papers/2605.14712</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233513847.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>12. ViMU: Benchmarking Video Metaphorical Understanding</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Video Understanding Models, Implicit Meaning, Social Contexts, ViMU, Multimodal Evidence</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to address the limitation of current video understanding models in interpreting implicit meanings and social contexts that go beyond literal visual comprehension.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The introduction of ViMU, a benchmark specifically designed to evaluate the subtext understanding capabilities of advanced video understanding models using multimodal evidence, is proposed. It utilizes both open-ended and multiple-choice questions without disclosing key evidence beforehand.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Existing models primarily focus on literal comprehension such as object and action recognition, lacking systematic understanding of metaphorical, ironic, and social meanings. ViMU provides a means to assess if models can infer implicit meanings grounded in context and social experiences.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.14607" target="_blank">https://huggingface.co/papers/2605.14607</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233547293.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>13. Forcing-KV: Hybrid KV Cache Compression for Efficient Autoregressive Video Diffusion Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Autoregressive video diffusion, attention complexity, KV cache compression, static heads, dynamic heads</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To address scalability issues in Autoregressive video diffusion models by optimizing attention head caching through a novel hybrid compression strategy.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduction of a hybrid KV cache compression strategy called Forcing-KV that categorizes attention heads into static and dynamic, applying structured static pruning and dynamic pruning based on segment-wise similarity.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed method significantly reduces cache memory usage by 30% and improves generation speed, achieving over 29 frames per second on a single NVIDIA H200 GPU and providing substantial speedups at various resolutions, demonstrating effective scalability enhancements.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.09681" target="_blank">https://huggingface.co/papers/2605.09681</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233616473.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>14. Does Synthetic Layered Design Data Benefit Layered Design Decomposition?</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Synthetic data, Layered design editing, Vision language models, Graphic design decomposition, Data-centric study</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To investigate whether pure synthetic layered data can enhance graphic design decomposition by providing scalable training and improved layer distribution control.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Conducted a data-centric study using the CLD baseline framework and the creation of a synthetic dataset, SynLayers.</p>
<p>   &#8211; Utilized vision language models for generating textual supervision and automated inference inputs through VLM-predicted bounding boxes.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Pure synthetic data outperforms non-scalable traditional datasets, proving its viability as a scalable substitute.</p>
<p>   &#8211; Increasing training data scale leads to improved performance, with gains saturating at around 50K samples.</p>
<p>   &#8211; Synthetic data allows balanced control over layer-count distributions, addressing imbalances seen in real-world datasets.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15167" target="_blank">https://huggingface.co/papers/2605.15167</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233654584.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>15. BOOKMARKS: Efficient Active Storyline Memory for Role-playing</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: search-based memory, role-playing agents, bookmarks, AI-generated summary</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To propose a search-based memory framework named BOOKMARKS that enhances role-playing agents by actively managing task-relevant information.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The method involves initializing, maintaining, and updating structured bookmarks to capture character behaviors and story elements effectively.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; BOOKMARKS significantly outperforms existing role-playing agent memory systems, offering advantages like active grounding and passive updating for task-specific details.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.14169" target="_blank">https://huggingface.co/papers/2605.14169</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233726877.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>16. Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Adaptive Teacher Exposure, Self-Distillation, Beta-policy controller, Learning-progress reward</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to improve large language model reasoning by dynamically adjusting teacher exposure during training through a learnable policy controller, identified as Adaptive Teacher Exposure for Self-Distillation (ATESD).</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; ATESD models the reveal ratio using a lightweight Beta-policy controller, conditioned on compact training-state statistics. It optimizes this controller with a discounted learning-progress reward to improve the student&#8217;s future performance rather than just focusing on immediate loss changes.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Experiments demonstrate that ATESD consistently outperforms existing self-distillation and RL baselines in reasoning tasks, establishing adaptive teacher exposure as an effective new axis for reasoning self-distillation.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.11458" target="_blank">https://huggingface.co/papers/2605.11458</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233828871.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>17. PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AI-generated summary, PhyMotion, Video Generation, Reinforcement Learning, Physics Simulator</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to enhance the realism of AI-generated human motion in videos by introducing a physics-grounded reward system called PhyMotion.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; PhyMotion evaluates 3D human trajectories using the MuJoCo physics simulator across three axes: kinematic plausibility, contact consistency, and dynamic feasibility.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; PhyMotion shows stronger correlation with human judgments compared to existing rewards, leading to improved motion realism in videos with both automatic metrics and human evaluations, and achieves improvements with modest training overhead.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.14269" target="_blank">https://huggingface.co/papers/2605.14269</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233755187.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>18. Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: geometry-first methodology, geometric constraints, perspective-view training strategy, 3D accuracy, photorealism</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to generate street-level 3D scenes from satellite images with improved geometric accuracy and photorealism.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study employs a geometry-first approach, incorporating novel geometric constraints and a perspective-view training strategy to enhance the feed-forward image-to-3D paradigm.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The new approach significantly enhances geometric accuracy and photorealism, outperforming existing methods such as Sat2Density++ and improving metrics like RMSE and FID. The method&#8217;s versatility is demonstrated through various applications, including semantic-map-to-3D synthesis and DSM estimation, with the code available on GitHub.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.14984" target="_blank">https://huggingface.co/papers/2605.14984</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260515233933224.mp4"></video> </figure>
</p>
</div>
<div style='height:30px'></div>
<h3>19. Aligning Latent Geometry for Spherical Flow Matching in Image Generation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Latent Flow Matching, Geodesic Flow, Image Generation, Spherical Linear Interpolation, Variational Autoencoder</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to improve image generation by using geodesic flow matching, which involves projecting latent points onto fixed radius spheres.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; It involves decomposition of latent points into radial and angular components, and applying spherical linear interpolation instead of linear paths for image generation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The method enhances class-conditional ImageNet-256 FID scores consistently, maintains the diffusion architecture intact, and eliminates the need for auxiliary components.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15193" target="_blank">https://huggingface.co/papers/2605.15193</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233900844.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>20. Ideology Prediction of German Political Texts</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: transformer-based model, political orientation, multiclass classifiers, DeBERTa-large, Gemma2-2B</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to create a transformer-based model to project political orientation on a continuous spectrum from left to right using multiple text corpora.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Developed and evaluated a transformer model using four distinct corpora, including German Bundestag plenary notes and Wahl-O-Mat data, focusing on mitigating overfitting via separate training and testing datasets.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The research concludes that transformer models, such as DeBERTa-large and Gemma2-2B, are effective in recognizing political framing across various text sources, highlighting the importance of model architecture and domain-specific training data alongside model size for estimating political bias.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.14352" target="_blank">https://huggingface.co/papers/2605.14352</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515234043013.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>21. LLM-based Detection of Manipulative Political Narratives</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Manipulative political narratives, Social media, Prompt-based filtering, Unsupervised clustering, AI-generated summary</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop a computational framework for detecting and structuring manipulative political narratives from social media posts without relying on predefined categories.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implemented a few-shot prompt-based filtering to distinguish manipulative narratives from legitimate critiques.</p>
<p>   &#8211; Utilized dimensionality reduction techniques and unsupervised clustering methods such as UMAP and HDBSCAN to identify distinct narrative clusters.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Successfully identified 41 distinct manipulative narrative clusters from over 1.2 million social media posts by integrating prompt-based filtering with unsupervised clustering.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.14354" target="_blank">https://huggingface.co/papers/2605.14354</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515234011905.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>22. RewardHarness: Self-Evolving Agentic Post-Training</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: RewardHarness, Agentic Reward Framework, Context Evolution, Image Edit Evaluation, Preference Demonstrations</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The primary objective is to develop RewardHarness, a framework designed to enhance image edit evaluation by utilizing limited human demonstrations, moving beyond traditional large-scale preference annotation models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The framework reimagines reward modeling through context evolution rather than weight optimization. It builds a library of tools and skills that iteratively improves with minimal human input, using a sophisticated Orchestrator and a frozen Sub-Agent to generate preference judgments.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; RewardHarness successfully achieves superior accuracy compared to existing models, even with minimal data usage. It records a 47.4% average accuracy in benchmarks, surpassing GPT-5 by 5.3 points, demonstrating its efficacy as a reward signal in RL-tuned models.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.08703" target="_blank">https://huggingface.co/papers/2605.08703</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515234115682.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>23. Unlocking Complex Visual Generation via Closed-Loop Verified Reasoning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Closed-Loop Visual Reasoning, Proxy Prompt Reinforcement Learning, Δ-Space Weight Merge, Multi-Step Reasoning, Pixel-Level Diffusion Generation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to enhance complex image synthesis by integrating visual-language planning with pixel-level diffusion generation, addressing challenges in latency and optimization in text-to-image models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The Closed-Loop Visual Reasoning (CLVR) framework is introduced, combining visual-language logical planning with pixel-level diffusion generation.</p>
<p>   &#8211; Innovations such as Proxy Prompt Reinforcement Learning (PPRL) are used for optimizing long-context scenarios by distilling multimodal histories.</p>
<p>   &#8211; Δ-Space Weight Merge (DSWM) method is proposed to reduce inference costs without expensive re-distillation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The CLVR framework demonstrates superior performance over existing open-source baselines and approaches the capabilities of proprietary commercial models, enabling effective general test-time scaling for complex visual generation.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.14876" target="_blank">https://huggingface.co/papers/2605.14876</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515234222777.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>24. BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: BEAM, Mixture-of-Experts, token-adaptive, binary masks</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To enhance the efficiency of Mixture-of-Experts models by introducing BEAM, a method that utilizes trainable binary masks for dynamic expert selection, aiming to reduce computation while maintaining high performance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Development of BEAM which incorporates trainable binary masks and uses a straight-through estimator and auxiliary regularization loss for end-to-end training.</p>
<p>   &#8211; Implementation of a custom CUDA kernel for integration with the vLLM inference framework to ensure efficient performance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; BEAM retains over 98% of the original model&#8217;s performance while reducing MoE layer FLOPs by up to 85%.</p>
<p>   &#8211; Achieves up to 2.5 times faster decoding and 1.4 times higher throughput, proving its effectiveness as a practical solution for efficient MoE inference.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.14438" target="_blank">https://huggingface.co/papers/2605.14438</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515234152715.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>25. Quantitative Video World Model Evaluation for Geometric-Consistency</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: PDI-Bench, Generative Models, Geometric Coherence, Monocular Reconstruction, Projective-Geometry Residuals</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models </p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To evaluate geometric coherence in AI-generated videos using a new quantitative framework named PDI-Bench, aiming to identify geometry-specific failure modes in video generators.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study uses segmentation and point tracking to obtain object-centric observations, which are then converted to 3D coordinates through monocular reconstruction. This process allows for computing projective-geometry residuals to capture failure dimensions such as scale-depth alignment, 3D motion consistency, and structural rigidity.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; PDI-Bench exposes consistent geometry-specific failure modes across state-of-the-art video generators, which are not detected by standard perceptual metrics. This framework serves as a diagnostic tool toward achieving physically grounded video generation, highlighting its importance given the limitations of current evaluation methods.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15185" target="_blank">https://huggingface.co/papers/2605.15185</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260515234248983.mp4"></video> </figure>
</p>
</div>
<div style='height:30px'></div>
<h3>26. CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Hierarchical Topological Reasoning, Visual Reasoning, Benchmarks, Fine-Tuning, Vision-Language Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce CurveBench as a new benchmark for evaluating hierarchical topological reasoning from visual input using non-intersecting Jordan curves.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Structured prediction task using a model to recover the rooted containment tree from images.</p>
<p>   &#8211; Evaluating models like Gemini 3.1 Pro and fine-tuning vision-language models such as Qwen3-VL-8B.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Despite simplicity, models show low accuracy on CurveBench, indicating challenges in topology-aware visual reasoning.</p>
<p>   &#8211; Fine-tuned models show improved performance, yet there remains a significant gap, especially in difficult tasks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.14068" target="_blank">https://huggingface.co/papers/2605.14068</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515234357337.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>27. PreScam: A Benchmark for Predicting Scam Progression from Early Conversations</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Conversational scams, Scam progression, Scam kill chain, Psychological actions, Real-time termination prediction</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study introduces the PreScam benchmark, designed to model the progression of scams through multi-turn conversations to understand how real-world scams evolve over time.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The research structures user-submitted scam reports into conversational scam instances following a hierarchy based on the scam lifecycle, annotated with psychological actions and victim responses. It benchmarks models on tasks such as real-time termination prediction and scammer action prediction.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Current models show capability in capturing some scam-related cues yet face challenges in accurately tracking risk escalation and manipulation tactics over multiple conversation turns. Supervised encoders outperform zero-shot language models in predicting conversation termination stages.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.12243" target="_blank">https://huggingface.co/papers/2605.12243</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515234329842.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>28. </h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="" target="_blank"></a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/thirteen/202605151778888644.jpg"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>29. Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Vision-Language-Action, temporal dynamics, quadratic optimization, training-free correction, Pace-and-Path Correction</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To improve Vision-Language-Action models&#8217; performance in dynamic environments by addressing temporal blindness without needing retraining.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implementation of a training-free, closed-form inference-time operator through quadratic optimization to simultaneously correct pace and path dynamics.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed method significantly outperforms existing training-free wrappers and dynamic-adaptive methods, enhancing success rates up to 28.8% and 25.9% over foundational VLA models in dynamic-only and mixed environments, respectively.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.11459" target="_blank">https://huggingface.co/papers/2605.11459</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515234343469.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>30. SPIN: Structural LLM Planning via Iterative Navigation for Industrial Tasks</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: DAG planning, execution control, AssetOpsBench, MCP Bench, LLM agent systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To improve task execution and plan validity in industrial LLM agent systems by integrating validated DAG planning with prefix-based execution control.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilization of a planning wrapper named SPIN that combines validated Directed Acyclic Graph (DAG) planning with prefix-based execution control. It validates and repairs plans before execution and evaluates DAG prefixes incrementally.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; SPIN effectively reduces executed tasks and tool calls, improving the accomplishment rate on AssetOpsBench and enhances planning, grounding, and dependency scores on MCP Bench, benefiting systems like GPT OSS1 and Llama 4 Maverick.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.14051" target="_blank">https://huggingface.co/papers/2605.14051</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515234312558.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>31. Nexus : An Agentic Framework for Time Series Forecasting</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Nexus, multi-agent forecasting, contextual information, LLMs, temporal fluctuations </p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop Nexus, a multi-agent forecasting framework that enhances accuracy and explainability by decomposing time series prediction into specialized stages to effectively integrate numerical patterns and contextual information.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The framework isolates macro-level and micro-level temporal fluctuations, integrates contextual information when available, and synthesizes a final forecast. It adapts to both seasonal signals and volatile, event-driven information without relying on external anchors.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Nexus demonstrates superior forecasting performance compared to state-of-the-art TSFMs and LLM baselines by utilizing current-generation LLMs&#8217; intrinsic abilities and provides reasoning traces that explain forecast drivers, solidifying that real-world forecasting requires agentic reasoning beyond sequence modeling.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.14389" target="_blank">https://huggingface.co/papers/2605.14389</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515234235720.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>32. LiSA: Lifelong Safety Adaptation via Conservative Policy Induction</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Adaptive safety guardrails, Policy abstractions, Evidence-aware confidence gating, Sparse feedback, Noisy user feedback</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study proposes LiSA (Lifelong Safety Adaptation), a framework that enhances safety guardrails for AI agents by adapting to their operating environments.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; LiSA utilizes policy abstractions to generalize sparse feedback, incorporates conflict-aware local rules, and applies evidence-aware confidence gating to improve robustness against noisy feedback.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; LiSA demonstrates superior performance compared to memory-based baselines under sparse and noisy feedback conditions, offering a practical approach to safeguard AI agents against real-world risks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.14454" target="_blank">https://huggingface.co/papers/2605.14454</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515234207068.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>33. Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: FEST, Reinforcement Learning, Few-Shot, Supervised Fine-Tuning, AI Native</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce FEST, a few-shot demonstration-guided reinforcement learning algorithm that combines supervised signals, on-policy learning, and weighted training to achieve high performance with minimal data.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilize only 128 demonstrations from a Supervised Fine-Tuning dataset, employing a combination of supervised signal, on-policy signal, and decaying weights to prevent overfitting and ensure sample efficiency.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; FEST achieves strong performance on several benchmarks, surpassing baseline methods with significantly less data while matching their performance when a full dataset is used.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15012" target="_blank">https://huggingface.co/papers/2605.15012</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515234135657.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>34. Boosting Omni-Modal Language Models: Staged Post-Training with Visually Debiased Evaluation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Omni-modal language models, visual shortcuts, post-training, OmniClean, self-distillation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study investigates whether current omni-modal benchmarks effectively differentiate visual shortcuts from genuine audio-visual-language evidence integration, and examines the impact of post-training techniques in visually debiased evaluation settings.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Nine omni-modal benchmarks were audited using visual-only probing, retaining queries that could not be solved visually to create OmniClean. A post-training process, OmniBoost, involving mixed bi-modal SFT, mixed-modality RLVR, and self-distillation on the Qwen2.5-Omni-3B model was evaluated.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The research concludes that omni-modal progress is more comprehensible with controlled visual leakage during evaluation. Additionally, small omni-modal models benefit from staged post-training, demonstrating competitive performance even without a stronger omni-modal teacher.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.12034" target="_blank">https://huggingface.co/papers/2605.12034</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515234058465.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>35. FutureSim: Replaying World Events to Evaluate Adaptive Agents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: FutureSim, world events, forecasting performance, test-time adaptation, reasoning about uncertainty</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; This study introduces FutureSim to evaluate AI agents&#8217; long-term predictive capabilities by simulating real-world event sequences.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The method involves using grounded simulations where AI agents forecast world events beyond their knowledge cutoff by interacting with a chronological replay of real news articles and questions.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; FutureSim demonstrated that AI agents exhibit significant performance gaps in forecasting, with best agents achieving 25% accuracy. It also highlighted areas for future research such as long-horizon test-time adaptation and reasoning about uncertainty.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15188" target="_blank">https://huggingface.co/papers/2605.15188</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515234027516.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>36. Dynamic Latent Routing</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: MDP, General Dijkstra Search, Dynamic Latent Routing, time-varying rewards, supervised fine-tuning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To explore the temporal composition of sub-policies in Markov Decision Processes (MDPs) with time-varying reward functions and introduce a novel method for language model post-training called Dynamic Latent Routing (DLR).</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilized General Dijkstra Search (GDS) to demonstrate the recovery of globally optimal policies through intermediate optimal sub-policies.</p>
<p>   &#8211; Proposed DLR, integrating discrete latent codes and routing policies through a dynamic search, enhancing model parameters in a single training stage.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Dynamic Latent Routing (DLR) effectively achieves superior performance in low-data fine-tuning scenarios, outperforming traditional supervised fine-tuning methods across various datasets and models with a significant average gain of +6.6 percentage points.</p>
<p>   &#8211; DLR demonstrates structured routing behaviors, surpassing existing discrete-latent baselines that typically fall short of supervised fine-tuning.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.14323" target="_blank">https://huggingface.co/papers/2605.14323</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233954857.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>37. Topology-Preserving Neural Operator Learning via Hodge Decomposition</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Hodge orthogonality, spectral interference, topological degrees of freedom, Hybrid Eulerian-Lagrangian architecture, Hodge Spectral Duality</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Foundations of AI</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; This research focuses on analyzing physical field equations on geometric meshes and aims to improve accuracy and efficiency through a hybrid Eulerian-Lagrangian architecture that distinctly separates topological and geometric components.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study utilizes Hodge theory and operator splitting to achieve a principled operator-level decomposition, using discrete differential forms and orthogonal auxiliary spaces to capture and represent various dynamics.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The developed Hybrid Eulerian-Lagrangian architecture, known as Hodge Spectral Duality (HSD), significantly enhances accuracy and efficiency on geometric graphs, maintaining fidelity to physical invariants.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.13834" target="_blank">https://huggingface.co/papers/2605.13834</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233918695.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>38. WildTableBench: Benchmarking Multimodal Foundation Models on Table Understanding In the Wild</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: WildTableBench, multimodal foundation models, table images, structural perception, numerical reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce WildTableBench as the first question-answering benchmark for real-world table images, addressing challenges in structural perception and numerical reasoning.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Evaluation of 21 proprietary and open-source multimodal foundation models on the newly introduced WildTableBench, which contains 402 high-information-density table images and 928 manually annotated questions.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The evaluation revealed significant challenges, with only one model exceeding 50% accuracy. Persistent weaknesses in structural perception and reasoning were identified, highlighting the benchmark&#8217;s value as a diagnostic tool for understanding capabilities.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.01018" target="_blank">https://huggingface.co/papers/2605.01018</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233845942.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>39. PRISM: Prior Rectification and Uncertainty-Aware Structure Modeling for Diffusion-Based Text Image Super-Resolution</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: PRISM, Text-SR, Flow-Matching Prior Rectification, Uncertainty-aware Residual Encoding, diffusion-based</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to develop a text super-resolution framework, PRISM, designed to improve accuracy even under severe degradation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study employs a diffusion-based framework integrating Flow-Matching Prior Rectification and a Structure-guided Uncertainty-aware Residual Encoder to address challenges in Text-SR.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; PRISM demonstrates state-of-the-art performance in both synthetic and real-world benchmarks with efficient, millisecond-level inference, offering improved text fidelity and readability.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.13027" target="_blank">https://huggingface.co/papers/2605.13027</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233811710.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>40. Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Self-improving language models, Environment-construction loop, Stable solve-verify asymmetry, EvoEnv, Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to develop self-improving language models that focus on constructing training environments rather than simply generating data, leveraging the concept of stable solve-verify asymmetry to ensure informative learning rewards.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The approach, as demonstrated in the EvoEnv system, involves synthesizing Python environments from seeds, conducting staged validation, semantic self-review, difficulty calibration, and novelty checks to maintain the challenging nature of tasks relative to the model’s capabilities.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The study finds that models achieve stable self-improvement not by producing more synthetic data, but by creating environments that remain structurally challenging. The EvoEnv method shows improvement in performance, increasing average metrics from 72.4 to 74.8, indicating a 3.3% relative gain.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.14392" target="_blank">https://huggingface.co/papers/2605.14392</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233741330.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>41. PanoWorld: Towards Spatial Supersensing in 360^circ Panorama World</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: PanoWorld, spherical spatial cross-attention, panoramic reasoning, equirectangular projection, geometry-aware</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Investigate pano-native understanding requiring MLLMs to reason over ERP panoramas as continuous, observer-centered spaces.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Defined key abilities for pano-native understanding, constructed a metadata pipeline for converting ERP panoramas into geometry-aware and depth-aware data, and introduced PanoWorld with Spherical Spatial Cross-Attention.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; PanoWorld significantly outperforms existing baselines on several benchmarks, demonstrating that effective panoramic reasoning needs pano-native supervision and model adaptation focusing on geometry awareness.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.13169" target="_blank">https://huggingface.co/papers/2605.13169</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233712220.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>42. RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: RAVEN, causal autoregressive, video generation, reinforcement learning, CM-GRPO</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Develop RAVEN to enable real-time video generation using causal autoregressive extrapolation to improve training alignment.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduce a new framework that aligns training with inference by repacking each self rollout into an interleaved sequence of clean and noisy states.</p>
<p>   &#8211; Propose CM-GRPO using reinforcement learning applied directly to a conditional Gaussian transition for consistency sampling.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; RAVEN demonstrates superior performance over existing causal video distillation models in quality and dynamic evaluations.</p>
<p>   &#8211; CM-GRPO enhances RAVEN’s performance further when combined, optimizing through a novel reinforcement learning approach.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15190" target="_blank">https://huggingface.co/papers/2605.15190</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260515233631017.mp4"></video> </figure>
</p>
</div>
<div style='height:30px'></div>
<h3>43. Orchard: An Open-Source Agentic Modeling Framework</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Agentic modeling, AI-generated summary, open-source framework, scalable agent training, Credit-assignment SFT</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To introduce Orchard, an open-source framework designed for scalable agentic modeling to train diverse autonomous agents effectively.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Employed Orchard Env, a lightweight environment service with reusable primitives for lifecycle management.</p>
<p>   &#8211; Developed three agentic modeling recipes: Orchard-SWE for coding, Orchard-GUI for vision-language tasks, and Orchard-Claw for personal assistance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Demonstrated that an open-source, harness-agnostic environment can support scalable agentic data and training across domains, achieving state-of-the-art performance metrics among comparable open-source models.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15040" target="_blank">https://huggingface.co/papers/2605.15040</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233601966.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>44. VGGT-Edit: Feed-forward Native 3D Scene Editing with Residual Field Prediction</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: 3D scene editing, depth-synchronized text injection, geometric displacements, DeltaScene Dataset</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper introduces VGGT-Edit, a framework for text-conditioned native 3D scene editing, aiming to improve the quality and efficiency over existing 2D-lifting methods.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; VGGT-Edit employs depth-synchronized text injection for aligning semantic guidance with spatial poses and a residual transformation head to predict direct geometric displacements, preserving the scene&#8217;s structure and stability.</p>
<p>   &#8211; The approach is reinforced with a multi-term objective function to ensure geometric accuracy and multi-view consistency. Additionally, a new dataset, the DeltaScene Dataset, is constructed to validate quality and efficacy.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; VGGT-Edit significantly outperforms traditional 2D-lifting baselines, offering sharper object details, enhanced multi-view consistency, and rapid inference capabilities.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15186" target="_blank">https://huggingface.co/papers/2605.15186</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233530925.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>45. DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: DiffusionOPD, multi-task training, online policy distillation, diffusion models, stochastic SDE</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective is to enhance multi-task training efficiency for diffusion models using a novel approach, DiffusionOPD, leveraging online policy distillation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; DiffusionOPD involves independently training task-specific teachers and distilling their knowledge into a unified student. It extends the OPD framework from discrete tokens to continuous-state Markov processes by deriving a closed-form per-step KL objective.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; DiffusionOPD outperforms existing reinforcement learning approaches in both training efficiency and final performance, achieving state-of-the-art results in multi-task settings.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15055" target="_blank">https://huggingface.co/papers/2605.15055</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233459979.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>46. FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: FrontierSmith, Open-ended coding, LLM coding, Competitive programming, Idea divergence metric</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To automate the creation of open-ended coding problems from closed-ended tasks to enhance the performance of Large Language Models (LLMs) on coding benchmarks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implementing FrontierSmith, an automated system that generates open-ended problem variants from existing competitive programming tasks, using a metric to select diverse solutions and training agents with synthesized data.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The synthesized open-ended problems lead to significant performance improvements in LLMs, achieving notable score gains on FrontierCS and ALE-bench, while also promoting more interactive and token-rich agent interactions.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.14445" target="_blank">https://huggingface.co/papers/2605.14445</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260515233431154.mp4"></video> </figure>
</p>
</div>
<div style='height:30px'></div>
<h3>47. ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Visual reasoning, agentic operations, functional tokens, latent visual reasoning, RL training</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective of the research is to develop a visual reasoning framework, ATLAS, that efficiently combines agentic operations and latent representations to improve performance on complex benchmarks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study introduces ATLAS, which utilizes functional tokens serving as both agentic operations and latent visual reasoning units without requiring visual supervision, and introduces a method called Latent-Anchored GRPO to stabilize training.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The research concludes that the ATLAS framework achieves superior performance and clear interpretability in visual reasoning tasks, showing potential for inspiring future research in the field.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15198" target="_blank">https://huggingface.co/papers/2605.15198</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233358967.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>48. EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: EvolveMem, adaptive memory, LLM agents, self-evolving memory, structured action space</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop EvolveMem, a self-evolving memory architecture for LLM agents, enabling truly adaptive memory through self-optimization of retrieval mechanisms.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes a structured action space optimized by an LLM-powered diagnosis module, which performs failure log analysis and configuration adjustments, supported by a meta-analyzer for evolution cycles.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; EvolveMem achieves significant performance improvements over baselines, indicating effectiveness in adaptive retrieval strategies and successful transfer across benchmarks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.13941" target="_blank">https://huggingface.co/papers/2605.13941</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233320674.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>49. RouteProfile: Elucidating the Design Space of LLM Profiles for Routing</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: LLM profiling, routing performance, structured profiles, query-level signals, AI-generated summary</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to understand how LLM profile design affects routing performance across different routers and to distinguish the role of profiles from router design.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; A systematic evaluation was conducted across three representative routers under both standard and new-LLM generalization settings, using a general design space of LLM profiles called RouteProfile.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Structured profiles outperform flat ones.</p>
<p>   &#8211; Query-level signals are more reliable than domain-level signals.</p>
<p>   &#8211; Generalization to new models benefits most from structured profiles under trainable configurations, highlighting LLM profile design as crucial for future routing research.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.00180" target="_blank">https://huggingface.co/papers/2605.00180</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233253151.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>50. WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: WildClawBench, CLI environments, multimodal tasks, Docker container, semantic verification</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To evaluate large language and vision-language models on realistic long-horizon tasks using actual CLI environments with real tools instead of synthetic sandboxes.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Execution of 60 human-authored, bilingual, multimodal tasks within Docker containers hosting true CLI agent harnesses; uses hybrid grading combining deterministic and semantic verification.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Current models like Claude Opus 4.7 achieve only 62.2% under OpenClaw, with significant variability among models, indicating unresolved challenges in long-horizon native-runtime evaluation.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10912" target="_blank">https://huggingface.co/papers/2605.10912</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233220583.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>51. Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AI Native, Multi-agent systems, coordination, self-improvement, collective intelligence</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To provide a unified review and a conceptual roadmap for autonomous, self-improving multi-agent intelligence through structured collaboration and continuous self-diagnosis and improvement.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduction of the LIFE progression framework: Lay the capability foundation, Integrate agents through collaboration, Find faults through attribution, and Evolve through autonomous self-improvement; systematic taxonomies and characterization of dependencies between stages.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Identification of open challenges at stage boundaries with a cross-stage research agenda, aiming to enhance coordination frameworks toward self-organizing forms of collective intelligence in multi-agent systems.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.14892" target="_blank">https://huggingface.co/papers/2605.14892</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233153611.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>52. MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: multimodal memory, visual evidence, VLM backbones, temporal tracking, detail extraction</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To evaluate the capabilities of agent memory in preserving and utilizing visual evidence across various tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduction of the MemEye framework to assess memory with emphasis on visual evidence granularity and retrieval usage complexity.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Current architectures struggle with preserving fine-grained visual details and reasoning about visual state changes, emphasizing the need for improved evidence routing, temporal tracking, and detail extraction techniques.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15128" target="_blank">https://huggingface.co/papers/2605.15128</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233128396.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>53. MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: vision-language models, memory capabilities, long-context LVLMs, memory-augmented agents, multimodal retrieval</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study introduces MEMLENS, a benchmark designed to evaluate memory capabilities in vision-language models during multi-session conversations, addressing the lack of systematic comparison between long-context LVLMs and memory-augmented agents.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The benchmark consists of 789 questions addressing five memory abilities, evaluated at different context lengths, and includes an image-ablation study to assess the necessity of visual evidence.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Results suggest that long-context LVLMs excel in short-context accuracy but struggle with prolonged conversations, whereas memory-augmented agents maintain length-stability but compromise on visual fidelity.</p>
<p>   &#8211; Neither approach independently solves the task effectively, indicating a need for hybrid architectures that integrate long-context attention with structured multimodal retrieval capabilities.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.14906" target="_blank">https://huggingface.co/papers/2605.14906</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233054671.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>54. Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: causal consistency distillation, frame-wise autoregression, AI-generated video, low-latency video generation, VBench</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to enable efficient frame-wise video generation with reduced latency and improved quality compared to existing chunk-wise approaches, especially for real-time interactive applications.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The paper introduces Causal Forcing++, a principled pipeline leveraging causal consistency distillation for few-step AR initialization, enhancing efficiency and optimization.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed method outperforms the state-of-the-art chunk-wise causal forcing in frame-wise settings, evidenced by improvements in metrics like VBench Total, VBench Quality, and VisionReward while significantly reducing latency and training cost.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.15141" target="_blank">https://huggingface.co/papers/2605.15141</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260515233024495.png"></figure>
</p>
</div>
<div style='height:30px'></div>
]]></content:encoded>
					
		
		<enclosure url="https://cdn.ainative.foundation/huggingface/20260515233233438.mp4" length="0" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/huggingface/20260515233336130.mp4" length="0" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/huggingface/20260515233933224.mp4" length="0" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/huggingface/20260515234248983.mp4" length="0" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/huggingface/20260515233631017.mp4" length="0" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/huggingface/20260515233431154.mp4" length="0" type="video/mp4" />

			</item>
		<item>
		<title>AI Native Daily Paper Digest &#8211; 20260514</title>
		<link>https://ainativefoundation.org/ai-native-daily-paper-digest-20260514/</link>
		
		<dc:creator><![CDATA[insights]]></dc:creator>
		<pubDate>Fri, 15 May 2026 00:41:44 +0000</pubDate>
				<category><![CDATA[Papers]]></category>
		<guid isPermaLink="false">https://ainativefoundation.org/ai-native-daily-paper-digest-20260514/</guid>

					<description><![CDATA[1. MinT: Managed Infrastructure for Training and Serving Millions of LLMs 🔑 Keywords: MinT, Low-Rank Adaptation, LoRA, distributed policy management, large model [&#8230;]]]></description>
										<content:encoded><![CDATA[<h3>1. MinT: Managed Infrastructure for Training and Serving Millions of LLMs</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: MinT, Low-Rank Adaptation, LoRA, distributed policy management, large model architectures</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper introduces MinT, a managed infrastructure system designed for efficient Low-Rank Adaptation (LoRA) training and serving, focusing on scaling across large model architectures and improving policy management.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; MinT utilizes a service interface to manage base models while exporting lightweight adapter revisions through processes like rollout, update, export, and evaluation. It scales in three ways: Scale Up for expansive model architectures, Scale Down for reduced storage, and Scale Out for effective policy addressability.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; MinT successfully manages million-scale LoRA policy catalogs by optimizing training and serving processes for selected adapter revisions over large 1T-class base models, improving both efficiency and scalability.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.13779" target="_blank">https://huggingface.co/papers/2605.13779</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233008064.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>2. AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AnyFlow, video diffusion, flow-map transition, consistency distillation, ODE sampling</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To introduce AnyFlow, a novel any-step video diffusion distillation framework that optimizes sampling trajectories using flow-map transition learning and backward simulation techniques.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Shifting the distillation target to optimize the full ODE sampling trajectory and proposing Flow Map Backward Simulation for efficient on-policy distillation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; AnyFlow matches or surpasses consistency-based methods in few-step video generation performance, demonstrating its effectiveness in scaling with sampling step budgets.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.13724" target="_blank">https://huggingface.co/papers/2605.13724</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260514233037843.mp4"></video> </figure>
</p>
</div>
<div style='height:30px'></div>
<h3>3. EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: EVA-Bench, Voice agents, Composite metrics, Task completion, Noise robustness</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Develop a comprehensive evaluation framework, EVA-Bench, for voice agents to simulate realistic conversations and measure performance effectively across voice-specific failure modes.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilized a bot-to-bot audio conversation system for dynamic multi-turn dialogues and automatic simulation validation, incorporating two composite metrics EVA-A (Accuracy) and EVA-X (Experience) for performance measurement across different agent architectures.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Found no system exceeds 0.5 on both EVA-A and EVA-X pass@1 metrics simultaneously. Noted substantial divergence between peak and reliable performance and observed significant robustness gaps due to accent and noise perturbations, with variation across architectures and metrics.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.13841" target="_blank">https://huggingface.co/papers/2605.13841</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233108908.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>4. Qwen-Image-VAE-2.0 Technical Report</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Variational Autoencoders, reconstruction fidelity, Global Skip Connections, semantic alignment, diffusability</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To improve the reconstruction fidelity and diffusability of high-compression Variational Autoencoders with enhanced architectures and training strategies.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilized improved architecture with Global Skip Connections and expanded latent channels.</p>
<p>   &#8211; Scaled training to billions of images and used a synthetic rendering engine.</p>
<p>   &#8211; Implemented asymmetric, attention-free encoder-decoder and enhanced semantic alignment strategies.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Qwen-Image-VAE-2.0 achieves state-of-the-art performance, excelling in both general and text-rich scenarios with high compression ratio.</p>
<p>   &#8211; Demonstrated superior diffusability, significantly enhancing convergence speed compared to existing models.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.13565" target="_blank">https://huggingface.co/papers/2605.13565</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233135379.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>5. TrackCraft3R: Repurposing Video Diffusion Transformers for Dense 3D Tracking</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Dense 3D tracking, Monocular video, Video diffusion transformers, Dual-latent representation, Temporal RoPE alignment</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To enable efficient dense 3D tracking from monocular video by adapting video diffusion transformers with dual-latent representation and temporal RoPE alignment.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilization of dual-latent representation for per-frame geometry and reference-anchored tracking.</p>
<p>   &#8211; Temporal RoPE alignment to specify target time for tracking, while using LoRA fine-tuning to convert generative models into reference-anchored tracking formulations.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; TrackCraft3R achieves state-of-the-art performance in dense 3D tracking benchmarks, excelling in speed and memory usage, and shows robustness to large motions and long videos.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.12587" target="_blank">https://huggingface.co/papers/2605.12587</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233208645.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>6. FrameSkip: Learning from Fewer but More Informative Frames in VLA Training</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: FrameSkip, Vision-Language-Action, action variation, temporal supervision imbalance, visual-action coherence</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to enhance VLA policy training by introducing FrameSkip, a method that prioritizes high-importance frames based on specific metrics, to address the temporal supervision imbalance in dense robot demonstration trajectories.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study employs FrameSkip, a data-layer frame selection framework, which scores trajectory frames using criteria such as action variation and visual-action coherence, and remaps training samples towards frames of higher importance while maintaining a target retention ratio.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; FrameSkip improves the success-retention trade-off in VLA policy training, achieving a higher macro-average success rate of 76.15% compared to 66.50% with full-frame training across benchmark environments, while preserving only 20% of unique frames.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.13757" target="_blank">https://huggingface.co/papers/2605.13757</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233236604.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>7. Asymmetric Flow Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Asymmetric Flow Modeling, Flow-based generation, Low-rank structure, Pixel diffusion models, Text-to-image generation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective was to develop a method, Asymmetric Flow Modeling (AsymFlow), that achieves efficient high-dimensional flow-based generation by restricting noise prediction to low-rank subspaces while maintaining full data prediction capabilities.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; AsymFlow utilizes a rank-asymmetric velocity parameterization to control noise prediction, enabling full-dimensional velocity recovery without altering the network architecture or the training/sampling procedures.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; AsymFlow achieved state-of-the-art performance in pixel-space text-to-image generation, demonstrated by a leading 1.57 FID on ImageNet 256&#215;256, and showcased significant visual realism improvements when finetuned from pretrained latent models.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.12964" target="_blank">https://huggingface.co/papers/2605.12964</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233307139.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>8. HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: weighted multi-relational memory, query-conditioned traversal, relational memory graph, reinforcement learning, adaptive memory retrieval</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Knowledge Representation and Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to improve long-horizon reasoning accuracy in AI systems through the introduction of HAGE, a weighted multi-relational memory framework.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; HAGE reorganizes memory as sequential, query-conditioned traversal over a unified relational memory graph, with edge embeddings modulated by a routing network.</p>
<p>   &#8211; Utilizes a reinforcement learning-based training framework to optimize routing behavior and edge representations with respect to downstream tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The HAGE framework demonstrates improved reasoning accuracy and an advantageous accuracy-efficiency trade-off compared to existing agentic memory systems.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.09942" target="_blank">https://huggingface.co/papers/2605.09942</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233332711.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>9. Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Dual-architecture framework, Autoregressive Large Language Models, Parallel token generation, Diffusion models, Consensus mechanism</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper introduces Orthrus, which aims to unify exact generation fidelity of autoregressive LLMs with the high-speed parallel generation capabilities of diffusion models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes a dual-architecture framework augmenting a frozen LLM with a trainable module to facilitate both autoregressive and parallel diffusion views, maintaining a shared KV cache for ensuring precise inference.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Orthrus achieves a balance between high-speed parallel generation and exact inference fidelity, providing up to a 7.8x speedup with minimal additional memory and parameter overhead.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.12825" target="_blank">https://huggingface.co/papers/2605.12825</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233359496.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>10. RoboEvolve: Co-Evolving Planner-Simulator for Robotic Manipulation with Limited Data</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: RoboEvolve, vision-language models, video generation models, co-evolutionary loop, continual learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective is to enhance robotic manipulation scalability through a framework that integrates Vision-Language Models (VLMs) and Video Generation Models (VGMs) to improve data efficiency and foster continuous learning.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The approach involves a co-evolutionary loop combining VLMs and VGMs, employing a cognitive-inspired dual-phase mechanism with daytime exploration for behavioral discovery and nighttime consolidation to optimize policies.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; RoboEvolve significantly enhances performance metrics, achieves high data efficiency with a 50x reduction in labeled data needs, and demonstrates robust continual learning capabilities without catastrophic forgetting.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.13775" target="_blank">https://huggingface.co/papers/2605.13775</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233428739.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>11. Offline Preference Optimization for Rectified Flow with Noise-Tracked Pairs</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Rectified Flow, PNAPO, Preference Optimization, Noise-Image Interpolation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Address the limitation of existing preference datasets in text-to-image models by introducing Prior Noise-Aware Preference Optimization (PNAPO) for rectified flow models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Augment preference data with noise samples and employ noise-image interpolation for effective trajectory estimation.</p>
<p>   &#8211; Introduce a dynamic regularization strategy based on reward gap and training progress.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; PNAPO improves preference metrics and reduces training compute when applied to state-of-the-art rectified flow text-to-image models.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.09433" target="_blank">https://huggingface.co/papers/2605.09433</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233455359.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>12. LEAD: Length-Efficient Adaptive and Dynamic Reasoning for Large Language Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: LEAD, Chain-of-Thought, Reinforcement Learning, Efficiency Rewards, Accuracy-Efficiency Score</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To enhance mathematical reasoning accuracy and efficiency by dynamically adapting reasoning efficiency with the LEAD method.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; LEAD employs online calibration of correctness-efficiency trade-offs and adaptive problem-specific length targets to replace static heuristics, using Potential-Scaled Instability for optimization.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; LEAD achieves superior accuracy and Accuracy-Efficiency Score on five mathematical reasoning benchmarks, offering shorter outputs versus the base model while excelling among RL-trained reasoning methods.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.09806" target="_blank">https://huggingface.co/papers/2605.09806</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233526313.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>13. MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Map-then-Act Paradigm, Delayed Environmental Perception, Epistemic Bottleneck, Knowledge-Augmented Execution, Global Exploration</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Knowledge Representation and Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Address the limitations of interactive LLM agents in environmental perception during execution by introducing the Map-then-Act Paradigm (MAP).</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Proposed a framework consisting of three stages: Global Exploration, Task-Specific Mapping, and Knowledge-Augmented Execution, aimed at improving agents&#8217; environmental understanding before execution.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Demonstrated that the MAP paradigm provides consistent gains in benchmarks, enabling better performance in various environments and highlighting the importance of environment understanding over mere imitation.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.13037" target="_blank">https://huggingface.co/papers/2605.13037</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233553957.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>14. FeatCal: Feature Calibration for Post-Merging Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Feature drift, Model merging, Calibration method, Sample efficiency, Closed-form solution</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective of the study is to address performance gaps in model merging through the analysis of feature drift and to propose a calibration method called FeatCal to enhance efficiency and benchmark results.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The research utilizes a theoretical framework to decompose feature drift into upstream propagation and local mismatch. FeatCal employs a layer-wise calibration of merged model weights using a small calibration set, executed through an efficient closed-form solution without gradient descent.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; FeatCal significantly outperforms existing calibration baselines on benchmarks such as CLIP and GLUE, demonstrating better sample efficiency and reduced calibration costs, while maintaining the benefits of model merging.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.13030" target="_blank">https://huggingface.co/papers/2605.13030</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233621668.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>15. Retrieval from Within: An Intrinsic Capability of Attention-Based Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Intrinsic Retrieval, Attention-based models, retrieval-augmented generation, evidence recall, answer quality</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To determine if an attention-based encoder-decoder can retrieve directly from its internal representations, unifying retrieval and generation processes.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduced the INTRA framework, where decoder attention queries score pre-encoded evidence chunks reused as context for generation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; INTRA unifies retrieval and generation, eliminating the retriever-generator mismatch typical in RAG pipelines.</p>
<p>   &#8211; Demonstrates superior performance over traditional retrieval pipelines in evidence recall and end-to-end answer quality on question-answering benchmarks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.05806" target="_blank">https://huggingface.co/papers/2605.05806</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233648148.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>16. AgentLens: Revealing The Lucky Pass Problem in SWE-Agent Evaluation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Software Engineering Agents, AgentLens, Lucky Pass, Process-level Assessment, Quality Scores</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to evaluate software engineering (SWE) agents using a process-level framework to differentiate between effective and ineffective approaches by identifying patterns like Lucky Passes and providing quality scoring.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Evaluation of 2,614 OpenHands trajectories from eight model backends on 60 SWE-bench Verified tasks, constructing task-level process references and using AgentLens framework for quality scoring and pattern analysis.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The process-level framework, AgentLens, distinguishes passing trajectories into Lucky, Solid, and Ideal tiers. Lucky Passes are further broken down into recurring mechanisms, with quality scores showing variance in model effectiveness, influencing ranking positions significantly.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.12925" target="_blank">https://huggingface.co/papers/2605.12925</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233718824.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>17. The Extrapolation Cliff in On-Policy Distillation of Near-Deterministic Structured Outputs</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AI-generated summary, On-policy distillation, Reward extrapolation, Lambda, Format-preserving</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study delves into the effects of increasing the reward-extrapolation coefficient in On-policy Distillation and its impact on structured-output tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilized a single-position Bernoulli reduction to derive a clip-safety threshold and extended rules to calibrated K-ary listwise JSON tasks with empirical validation through various tests.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Identified a critical threshold (lambda*) where operating above it causes a shift from format-preserving to format-collapsing in structured-output tasks, achieving performance parity with significant parameter reduction.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.08737" target="_blank">https://huggingface.co/papers/2605.08737</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233817585.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>18. Position: LLM Inference Should Be Evaluated as Energy-to-Token Production</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Energy-to-token, Token Production Function, Joules/token, Operational efficiency, PUE</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to evaluate LLM inference as energy-to-token production, emphasizing factors beyond traditional metrics like accuracy and latency.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; A Token Production Function is formalized to analyze token production under constraints such as compute-per-token and energy-per-token.</p>
<p>   &#8211; System optimizations like KV-cache compression and quantization are considered energy-to-token levers.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The study suggests that inference papers should report energy-related metrics (e.g., Joules/token) alongside accuracy and latency.</p>
<p>   &#8211; System optimizations should focus on reducing resource consumption under fixed quality and service targets. </p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.11733" target="_blank">https://huggingface.co/papers/2605.11733</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233747265.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>19. An Empirical Study of Automating Agent Evaluation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Automated agent evaluation, EvalAgent, AI assistants, evaluation skills, meta-evaluation framework</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To investigate the efficacy of AI assistants, specifically EvalAgent, in automating the end-to-end agent evaluation process with improved reliability and accuracy.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduction of EvalAgent, an AI tool that employs specialized evaluation skills and a trace-based pipeline to produce evaluation artifacts.</p>
<p>   &#8211; Development of a meta-evaluation framework and the AgentEvalBench benchmark to systematically assess the performance of generated evaluations.</p>
<p>   &#8211; Proposal of the Eval@1 metric to evaluate the effectiveness of evaluation code execution.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; EvalAgent significantly enhances evaluation accuracy, as evidenced by an increase in the Eval@1 metric from 17.5% to 65%.</p>
<p>   &#8211; EvalAgent demonstrates a 79.5% preference rate over baseline methods according to human expert evaluation.</p>
<p>   &#8211; The integration of evaluation skills is crucial, with their removal causing Eval@1 performance to drop to 30%.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.11378" target="_blank">https://huggingface.co/papers/2605.11378</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233955568.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>20. F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AI-generated summary, Large Language Models, autoregressive model, factorized group-relative policy optimization, end-to-end optimization</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Propose a unified framework to address the credit assignment challenges in end-to-end retrieval optimization by combining candidate generation and ranking in a single autoregressive model.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Factorized group-relative policy optimization (F-GRPO) is used, which factorizes the policy into candidate generation and ranking while utilizing a single LLM backbone.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The framework improves top-ranked performance over existing GRPO and decoupled baselines and outperforms supervised alternatives; it remains competitive with strong zero-shot rerankers without requiring architectural changes at inference time.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.12995" target="_blank">https://huggingface.co/papers/2605.12995</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233919798.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>21. IndicMedDialog: A Parallel Multi-Turn Medical Dialogue Dataset for Accessible Healthcare in Indic Languages</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Multilingual Dialogue Systems, Indic Languages, Personalized Symptom Elicitation, Parameter-efficient Adaptation, Clinical Plasibility</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI in Healthcare</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce IndicMedDialog, a parallel multi-turn medical dialogue dataset covering English and nine Indic languages, for realistic and multilingual medical consultations.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Extend MDDial with LLM-generated consultations, translated with TranslateGemma, verified and refined for accuracy.</p>
<p>   &#8211; Fine-tune IndicMedLM via parameter-efficient adaptation, incorporating patient pre-context for personalized interactions.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Evaluated against zero-shot multilingual baselines and performed systematic error analysis, validating clinical plausibility through medical expert evaluation.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.13292" target="_blank">https://huggingface.co/papers/2605.13292</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233847168.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>22. FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: FAAST, task adaptation, forward-only computation, fast weights, pretrained models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To introduce FAAST, a forward-only associative adaptation method aimed at efficient task adaptation by using fast weights.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implement a method that compiles labeled examples into fast weights through a single-pass forward-only computation, eliminating dependence on memory or context.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; FAAST achieves significant improvements in speed and memory efficiency, reducing adaptation time by over 90% and saving memory usage by up to 95% compared to traditional methods like backpropagation and memory/context-based adaptation. This makes FAAST a highly efficient solution for resource-constrained models across image classification and language modeling benchmarks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.04651" target="_blank">https://huggingface.co/papers/2605.04651</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514234120210.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>23. From Pixels to Concepts: Do Segmentation Models Understand What They Segment?</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: CAFE, concept-faithful segmentation, promptable models, counterfactual manipulation, semantic grounding</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to evaluate concept-faithful segmentation in promptable segmentation models using a new benchmark called CAFE (Counterfactual Attribute Factuality Evaluation).</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The CAFE benchmark assesses models through attribute-level counterfactual manipulation by modifying attributes such as surface appearance, context, or material composition while preserving the target region and ground-truth mask.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Experiments using CAFE reveal a gap between localization quality and concept discrimination, indicating that accurate mask prediction does not always ensure faithful semantic grounding in models.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.09591" target="_blank">https://huggingface.co/papers/2605.09591</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514234050182.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>24. ShapeCodeBench: A Renewable Benchmark for Perception-to-Program Reconstruction of Synthetic Shape Scenes</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: ShapeCodeBench, perception-to-program reconstruction, raster image, executable drawing program, synthetic benchmark</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce a benchmark named ShapeCodeBench for evaluating models in generating executable drawing programs from raster images.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilize a synthetic benchmark that generates raster images for models to emit executable drawing programs, evaluated using metrics like exact match and pixel accuracy.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The benchmarking results show competitive performance in specific scenarios, yet the exact match rates are low, indicating potential for further improvements and developments.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.11680" target="_blank">https://huggingface.co/papers/2605.11680</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514234022768.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>25. Active Tabular Augmentation via Policy-Guided Diffusion Inpainting</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Tabular Augmentation Policy, diffusion inpainting, learner-conditioned policy, data-scarce domains</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to address the fidelity-utility gap in generative tabular augmentation, particularly in data-scarce domains, by focusing on not just generating data but optimizing when and what to generate to improve downstream model performance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The proposed method, TAP (Tabular Augmentation Policy), combines diffusion inpainting with a lightweight, learner-conditioned policy to direct data generation towards high-utility regions, ensuring safe data augmentation through explicit gating and conservative windowed commitment.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; TAP consistently outperformed strong generative baselines in data-scarce environments, achieving up to a 15.6 percentage point increase in classification accuracy and reducing regression RMSE by up to 32% across seven real-world datasets.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10315" target="_blank">https://huggingface.co/papers/2605.10315</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514234257350.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>26. Learning to Explore: Scaling Agentic Reasoning via Exploration-Aware Policy Optimization</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: variational inference, exploration-aware, reinforcement learning, text-based benchmarks, GUI-based benchmarks</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to develop an exploration-aware reinforcement learning framework that enables Large Language Model (LLM) agents to selectively explore actions only when the uncertainty is high.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The proposed method integrates a fine-grained reward function through variational inference, which evaluates exploratory actions by estimating their potential to enhance future decision-making. Additionally, it employs an exploration-aware grouping mechanism for optimizing these actions separately from task-completion actions.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The framework demonstrated consistent performance improvements across diverse and challenging text-based and GUI-based agent benchmarks, allowing for selective exploration and efficient transition to task execution.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.08978" target="_blank">https://huggingface.co/papers/2605.08978</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514234225755.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>27. Federation of Experts: Communication Efficient Distributed Inference for Large Language Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Mixture of Experts, Federation of Experts, KV heads, Inference Throughput, Latency</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to enhance computational efficiency in Large Language Models by restructuring mixture of experts blocks into clusters, thus improving communication bottlenecks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The novel Federation of Experts (FoE) architecture is introduced, restructuring transformer layer MoE blocks into multiple clusters each responsible for KV heads, and reducing communication overhead in both single-node and multi-node settings.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The implementation of FoE demonstrates a significant improvement in inference throughput and latency, reducing forward-pass latency by up to 5.2x, while maintaining generation quality similar to the original Mixture of Experts model.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06206" target="_blank">https://huggingface.co/papers/2605.06206</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514234150158.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>28. </h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="" target="_blank"></a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/thirteen/202605141778802188.jpg"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>29. WriteSAE: Sparse Autoencoders for Recurrent State</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: WriteSAE, Sparse Autoencoder, Hybrid Recurrent Language Models, Matrix Cache Write, Token-level Interventions</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper introduces WriteSAE, a new sparse autoencoder enabling decomposition and editing of matrix cache writes in state-space and hybrid recurrent language models to enhance token-level interventions.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; WriteSAE uniquely decomposes decoder atoms into native write shapes and trains under matched Frobenius norm, allowing effective token intervention by substituting atoms and predicting logit shifts.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; WriteSAE significantly outperforms existing models like Gated DeltaNet and Mamba-2 in token interventions, with successful applications demonstrated in matrix-recurrent write sites, elevating performance metrics dramatically.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.12770" target="_blank">https://huggingface.co/papers/2605.12770</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514234242171.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>30. FlowCompile: An Optimizing Compiler for Structured LLM Workflows</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: FlowCompile, structured LLM workflows, compile-time exploration, accuracy-latency trade-offs, workflow optimization</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To optimize complex multi-agent tasks in AI by exploring workflow configurations at compile-time to balance accuracy and latency without retraining.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduce FlowCompile, a workflow compiler that decomposes workflows into sub-agents, profiles them under diverse configurations, and estimates workflow-level accuracy and latency using a structure-aware proxy.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; FlowCompile consistently outperforms existing heuristically optimized and routing-based configurations, delivering up to a 6.4x speedup and creating a reusable optimization artifact.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.13647" target="_blank">https://huggingface.co/papers/2605.13647</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514234207787.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>31. SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AI-generated summary, Large Language Model (LLM) agents, decision boundaries, context-aware defense rules, self-evolution mechanism</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To design SafeHarbor, a novel framework for LLM agents that establishes precise decision boundaries by using context-aware defense rules to enhance safety without compromising utility.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implementing a hierarchical memory system and an information entropy-based self-evolution mechanism to support dynamic rule injection for improved safety and performance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; SafeHarbor achieves state-of-the-art performance by maintaining a high refusal rate for harmful requests while also ensuring significant benign task utility, as demonstrated in experiments with GPT-4o.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.05704" target="_blank">https://huggingface.co/papers/2605.05704</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514234136371.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>32. Source or It Didn&#8217;t Happen: A Multi-Agent Framework for Citation Hallucination Detection</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AI-generated summary, citation hallucination detection, multi-agent detector</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective of this research is to develop a system that can accurately detect fabricated citations in scientific writing by classifying them into a 12-code taxonomy.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The research employs a cascading multi-agent system, CiteTracer, which extracts structured citations, retrieves evidence, applies deterministic field matching, and routes ambiguous cases to specialized judges.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; CiteTracer demonstrates high accuracy, achieving 97.1% on a synthetic benchmark and effectively detecting 97.1% of fabrications in real-world citations without abstaining.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.08583" target="_blank">https://huggingface.co/papers/2605.08583</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514234107067.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>33. M2Retinexformer: Multi-Modal Retinexformer for Low-Light Image Enhancement</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Multi-Modal, Retinexformer, Depth Cues, Cross-Attention, Adaptive Gating</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To enhance low-light image quality by integrating depth cues, luminance priors, and semantic features through a novel multi-modal framework.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilization of cross-attention fusion and adaptive gating mechanisms within a framework called M2Retinexformer for processing multi-scale modalities.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed framework demonstrates improved performance over existing Retinexformer and state-of-the-art methods on LOL, SID, SMID, and SDSD benchmark datasets.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.12556" target="_blank">https://huggingface.co/papers/2605.12556</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514234037183.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>34. From Generalist to Specialist Representation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Nonparametric Identifiability, Generalist Model, Specialist Representation, Task-Relevant Latent Representation, Sparsity Regularization</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Foundations of AI</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Establish nonparametric identifiability for task-relevant representations from generalist models without relying on parametric assumptions or interventions.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Proven identifiability of structures between time steps and tasks in a nonparametric and unsupervised setting.</p>
<p>   &#8211; Demonstrated disentanglement of task-relevant representations using a simple sparsity regularization.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Provides a hierarchical foundation for task structure identifiability across time steps and disentanglement of task-relevant representations within each step, marking a step toward transitioning from generalist to specialist models.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.12733" target="_blank">https://huggingface.co/papers/2605.12733</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514234010231.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>35. Frequency Bias and OOD Generalization in Neural Operators under a Variable-Coefficient Wave Equation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Neural Operators, PDE Solving, Fourier Neural Operator, Deep Operator Network, Distribution Shifts</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to investigate the generalization behaviors of neural operators for partial differential equations (PDEs) under distribution shifts, specifically focusing on smoothness and frequency variations.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The analysis was conducted using two neural operator architectures, the Fourier Neural Operator (FNO) and the Deep Operator Network (DeepONet), in a one-dimensional wave propagation context. The study evaluated their performance under structured out-of-distribution (OOD) settings that vary input frequency and coefficient smoothness.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Both FNO and DeepONet maintain stable performance under smoothness shifts, with FNO achieving lower error. However, under frequency shifts, FNO shows a sharp increase in error for high-frequency inputs, while DeepONet presents milder degradation. The findings suggest a fundamental gap between in-distribution performance and generalization under distribution shifts, highlighting the importance of addressing architectural representation bias in developing reliable neural operators for physics-based PDE simulations.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.12997" target="_blank">https://huggingface.co/papers/2605.12997</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233942418.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>36. PersonalAI 2.0: Enhancing knowledge graph traversal/retrieval with planning mechanism for Personalized LLM Agents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: PersonalAI 2.0, Knowledge Graphs, GraphRAG, Information-Retention Score</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Knowledge Representation and Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to enhance large language model-based systems by integrating external knowledge graphs to improve factual correctness and precision in answer generation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilization of dynamic, multistage query processing pipelines and adaptive information search mechanisms.</p>
<p>   &#8211; Implementation of graph traversal algorithms like BeamSearch and WaterCircles to outperform standard retrieval methods.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; PersonalAI 2.0 demonstrates an average 4% gain in reducing hallucination rates and boosting precision when evaluated by LLM-as-a-Judge across multiple benchmarks.</p>
<p>   &#8211; An 18% boost is achieved with enabled search plan enhancements over six datasets, highlighting its potential as a foundational model for personalized AI applications.</p>
<p>   &#8211; The ablation study reveals a state-of-the-art result on the MINE-1 benchmark, achieving an 89% information-retention score.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.13481" target="_blank">https://huggingface.co/papers/2605.13481</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233905374.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>37. MC-RFM: Geometry-Aware Few-Shot Adaptation via Mixed-Curvature Riemannian Flow Matching</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Riemannian flow-matching, few-shot adaptation, hyperbolic factor, Euclidean factor, Transformer backbones</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce a novel framework (MC-RFM) for few-shot adaptation in mixed-curvature manifolds to outperform existing methods across benchmarks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Develop a product manifold approach combining hyperbolic and Euclidean spaces to model feature displacement.</p>
<p>   &#8211; Implement adaptation through task-conditioned continuous transport with a flow-matching objective.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; MC-RFM outperforms current methods in visual recognition benchmarks, particularly on Transformer backbones and fine-grained datasets, by effectively modeling feature representation geometry.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.08557" target="_blank">https://huggingface.co/papers/2605.08557</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233831213.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>38. MemReread: Enhancing Agentic Long-Context Reasoning via Memory-Guided Rereading</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: MemReread, long-context reasoning, agent memory, question decomposition, reinforcement learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Knowledge Representation and Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To address long-context reasoning challenges while maintaining linear time complexity, avoiding intermediate retrieval through question decomposition and rereading.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Developed MemReread based on streaming reading to trigger question decomposition and rereading, utilizing a reinforcement learning framework for dynamic control over computational overhead.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; MemReread outperforms baseline frameworks on long-context reasoning tasks while preserving logical flow, maintaining linear time complexity.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10268" target="_blank">https://huggingface.co/papers/2605.10268</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233802468.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>39. Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty?</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Multimodal large language models, Visual Aesthetic Benchmark, comparative selection, fine-tuning, expert judgment</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Examine the alignment of multimodal models with human expert aesthetic judgment, focusing on the effectiveness of scalar score predictions versus direct ranking comparisons in visual tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduced the Visual Aesthetic Benchmark (VAB) featuring 400 tasks across various visual disciplines to evaluate models on comparative selection using expert consensus labels.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The study highlights a significant performance gap between multimodal models and human experts in aesthetic judgment, with only 26.5% task accuracy compared to 68.9% by humans. Fine-tuning on expert data significantly improves model performance, indicating the transferability of the VAB comparative signal.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.12684" target="_blank">https://huggingface.co/papers/2605.12684</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233733819.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>40. Vividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognition</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: studio-bias, multilingual ASR, Vividh-ASR, R-MFT, Whisper</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to identify and address the studio-bias in multilingual ASR models used for low-resource languages, improving spontaneous speech performance without sacrificing efficiency.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The introduction of Vividh-ASR, a complexity-stratified benchmark, and the implementation of controlled studies on learning-rate timing and curriculum ordering. The reverse multi-stage fine-tuning (R-MFT) approach was applied to compare performance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Early large parameter updates and a hard-to-easy curriculum significantly enhance performance on spontaneous speech, achieving efficiency in parameters.</p>
<p>   &#8211; The R-MFT method allows a 244M Whisper model to match or exceed the performance of larger models, ensuring the preservation of acoustic geometry in the encoder while concentrating adaptation in the decoder.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.13087" target="_blank">https://huggingface.co/papers/2605.13087</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233703336.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>41. BEACON: A Multimodal Dataset for Learning Behavioral Fingerprints from Gameplay Data</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Continuous authentication, Behavioral biometrics, Multimodal dataset, Esports, Behavioral profiling</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce BEACON, a large-scale, multimodal dataset to advance research in continuous authentication and behavioral biometrics in competitive gaming environments.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Captures fine-grained behavioral signals in competitive Valorant gameplay across diverse skill tiers using extensive data from multiple modalities, including mouse dynamics, keystrokes, and network captures.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Provides a comprehensive benchmark for studying continuous authentication, user drift, and behavioral profiling in high-cognitive load esports settings; enables the development and testing of next-generation behavioral fingerprinting and security models.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10867" target="_blank">https://huggingface.co/papers/2605.10867</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233635294.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>42. Context Training with Active Information Seeking</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Context Optimization, Active Information Seeking, Large Language Models, Search-Based Training, Downstream Tasks</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To enhance context optimization methods with active information seeking tools like Wikipedia search and browser tools to improve performance in diverse domains.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Integration of search and browser tools with context optimizers in large language models, utilizing a search-based training procedure to maintain and prune multiple candidate contexts.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Active information seeking enhances performance and delivers consistent gains across low-resource translation, health scenarios, and reasoning tasks. The method is also data-efficient and robust across various hyperparameters.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.13050" target="_blank">https://huggingface.co/papers/2605.13050</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233607599.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>43. Results and Retrospective Analysis of the CODS 2025 AssetOpsBench Challenge</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Privacy-Aware, Codabench, Multi-Agent Orchestration, Hidden Evaluation, Leaderboard</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To analyze the CODS 2025 challenge, focusing on the metrics of leaderboard evaluation, impact of hidden evaluations, and rewarded design patterns.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilized data from rank sheets, submission logs, team registrations, best-submission exports, organizer reports, system papers, and source trees for verified planning tracks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The public planning leaderboard reaches saturation at 72.73%, with no improvement from richer prompts.</p>
<p>   &#8211; Hidden evaluations show moderate correlation between public and private scores in planning, but negative correlation in execution.</p>
<p>   &#8211; The official composite score is minimally affected by numerical changes, with possible impact on team rankings.</p>
<p>   &#8211; Operationally account-based, yet substantively team-based results are evident, with a significant reduction in effective team count from registrations.</p>
<p>   &#8211; Successful execution methods prioritize improving guardrails like response selection and context control over novel agent architectures.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.08518" target="_blank">https://huggingface.co/papers/2605.08518</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233541955.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>44. Revisiting DAgger in the Era of LLM-Agents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: DAgger, Long-Horizon, Supervised Fine-Tuning, Reinforcement Learning, Covariate Shift</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to improve long-horizon language model (LM) agents by combining the strengths of supervised fine-tuning and reinforcement learning through a DAgger-style training approach.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The research employs a hybrid DAgger algorithm that interpolates between student and teacher policies for trajectory collection, directly interacting with environments to mitigate covariate shift and enhance feedback quality.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The DAgger-style training significantly enhances the performance of LM agents, as demonstrated with software-engineering agents, surpassing existing 4B and 8B post-training baselines, and closing the gap with larger models on evaluation benchmarks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.12913" target="_blank">https://huggingface.co/papers/2605.12913</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233514164.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>45. RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: RealICU, ICU, Large Language Models, Clinical Recommendation, Patient Trajectory</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI in Healthcare</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to evaluate large language models (LLMs) for decision support in intensive care units (ICUs) using a hindsight-annotated benchmark, RealICU, to reveal limitations in clinical recommendation accuracy and early interpretation bias.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The researchers introduce RealICU, formulating tasks such as assessing patient status and recommending actions, using hindsight annotations. Two datasets, RealICU-Gold and RealICU-Scale, were created to gauge the performance of existing LLMs in a realistic ICU environment.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; RealICU reveals two failure modes in existing LLMs: a recall-safety tradeoff and an anchoring bias to early interpretations. The study also introduces ICU-Evo to improve long-horizon reasoning, yet safety failures persist, emphasizing the need for enhanced AI decision-support systems in high-stakes care.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.13542" target="_blank">https://huggingface.co/papers/2605.13542</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233442813.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>46. PresentAgent-2: Towards Generalist Multimodal Presentation Agents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: PresentAgent-2, presentation video generation, multimodal media, agentic framework, interaction</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop PresentAgent-2, an agentic framework for generating presentation videos from user queries, leveraging multimodal resources and interactive delivery.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilized deep research on presentation-friendly sources to collect multimodal resources (text, images, GIFs, videos).</p>
<p>   &#8211; Constructed and evaluated three presentation modes: Single Presentation, Discussion, and Interaction, with criteria focused on content quality and media use.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; PresentAgent-2 advances presentation generation from static slides to a dynamic, query-driven, and research-grounded process, integrating multimodal media and interactive engagement.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.11363" target="_blank">https://huggingface.co/papers/2605.11363</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260514233413298.mp4"></video> </figure>
</p>
</div>
<div style='height:30px'></div>
<h3>47. Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Multi-hop question answering, Retrieval-Augmented Generation, Program synthesis, Reasoning process, Deterministic feedback</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Knowledge Representation and Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Reformulate multi-hop question answering as program synthesis and execution for enhanced structured reasoning and performance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduce \pyrag framework, representing reasoning processes as executable Python programs rather than free-form reasoning, and facilitate compiler-grounded self-repair and execution-driven adaptive retrieval.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The \pyrag framework demonstrates significant performance improvements over existing methods on diverse QA benchmarks and makes publicly available models and resources.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.12975" target="_blank">https://huggingface.co/papers/2605.12975</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233345352.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>48. Learning Agentic Policy from Action Guidance</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Agentic reinforcement learning, Large Language Models, exploration capability, action data, supervised fine-tuning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce a method to enhance exploration capabilities in Large Language Models using agentic reinforcement learning with action data from human interactions.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilize action data as plan-style reference guidance to improve the agentic policy&#8217;s ability to reach reward states.</p>
<p>   &#8211; Implement mixed-policy training to combine guided and unguided rollouts, minimizing off-policy risk through a minimal intervention principle.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed ActGuide-RL method significantly improves performance on search-agent benchmarks, reducing reliance on extensive supervised fine-tuning data by using scalable action guidance.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.12004" target="_blank">https://huggingface.co/papers/2605.12004</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233319441.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>49. The DAWN of World-Action Interactive Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: World-Action Interactive Models, Autonomous Driving, DAWN, Latent Generative Baseline, Long-Horizon Planning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to enhance autonomous driving by integrating scene evolution and action planning through World-Action Interactive Models (WAIMs).</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The research introduces the DAWN model, which leverages a semantic latent space to couple a World Predictor with a World-Conditioned Action Denoiser for recursive refinement during inference.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; DAWN demonstrates strong planning performance and safety on multiple autonomous driving benchmarks, supporting interactive world-action generation as a promising approach for effective long-horizon trajectory generation.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.11550" target="_blank">https://huggingface.co/papers/2605.11550</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233252859.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>50. Many-Shot CoT-ICL: Making In-Context Learning Truly Learn</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: In-context learning, large language models, chain-of-thought, test-time learning, Curvilinear Demonstration Selection</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To study the scaling behaviors of many-shot in-context learning (ICL) for reasoning tasks and contrast it with non-reasoning tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Examination of many-shot chain-of-thought ICL across various language models to analyze the effects of CoT demonstrations, retrieval methods, and demonstration ordering.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Standard many-shot rules do not apply universally across reasoning tasks.</p>
<p>   &#8211; Increasing CoT demonstrations benefits reasoning-oriented models but can destabilize non-reasoning models.</p>
<p>   &#8211; Similarity-based retrieval is ineffective for reasoning tasks due to poor procedural compatibility prediction.</p>
<p>   &#8211; Performance variability arises with more CoT demonstrations.</p>
<p>   &#8211; Introduced Curvilinear Demonstration Selection method, achieving improved performance.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.13511" target="_blank">https://huggingface.co/papers/2605.13511</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233221551.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>51. Edit-Compass &amp; EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Image Editing Models, Evaluation Protocols, Reward Models, Structured Reasoning, Scoring Rubrics</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper introduces Edit-Compass and EditReward-Compass, a unified evaluation suite designed to address challenges in assessing image editing and reward models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Edit-Compass involves 2,388 annotated instances across six task categories with multi-dimensional evaluation using structured reasoning and scoring rubrics.</p>
<p>   &#8211; EditReward-Compass uses 2,251 preference pairs to simulate realistic reward modeling scenarios.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Existing benchmarks are inadequate for evaluating cutting-edge image editing and reward models; the new suite aims to provide a more reliable assessment framework.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.13062" target="_blank">https://huggingface.co/papers/2605.13062</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233154143.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>52. Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AI agents, negotiation games, LLM-based, target-adaptive text-tabular, decision-oriented feature</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To investigate whether AI agents can predict the decisions of unfamiliar counterparts in negotiation games by using target-adaptive text-tabular prediction.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; AI agents utilize a tabular foundation model combining structured game state and LLM-based text representations, enhanced by a frozen LLM as an Observer for decision-oriented features.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The study demonstrates that the formulated target-adaptive text-tabular approach effectively predicts counterpart decisions, outperforming direct prompting methods and showing improved prediction accuracy.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.12411" target="_blank">https://huggingface.co/papers/2605.12411</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233123393.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>53. Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Long-context modeling, Vision-language models, Long-document VQA, Retrieval-heavy mixtures, MMProLong</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Enhance vision-language models&#8217; capability to manage long-context data effectively and improve performance in long-document understanding.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Conducted a systematic study on long-context continued pre-training for a 7B model, exploring the design and balancing of long-context data mixtures.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Long-document VQA is more effective than OCR transcription, with balanced sequence-length distribution outperforming target-length-focused approaches.</p>
<p>   &#8211; Retrieval-heavy data mixtures improve task diversity and long-context capabilities, preserving short-context abilities.</p>
<p>   &#8211; MMProLong, developed with a limited 5B-token budget, enhances performance on long-document VQA by 7.1% and generalizes well to multiple long-context tasks without additional training.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.13831" target="_blank">https://huggingface.co/papers/2605.13831</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233055322.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>54. MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Multimodal Tabular Learning, Tabular Foundation Models, pretrained embeddings, MulTaBench</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to improve performance in Multimodal Tabular Learning by demonstrating that task-specific embedding tuning is beneficial, especially when modalities provide complementary predictive signals.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduction of MulTaBench, a benchmark of 40 datasets divided between image-tabular and text-tabular tasks, focusing on predictive tasks where task-specific tuning is required.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Target-aware representation tuning enhances performance across text and image modalities, and various tabular learners, proving beneficial for developing new Multimodal Tabular Foundation Models.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10616" target="_blank">https://huggingface.co/papers/2605.10616</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260514233025896.png"></figure>
</p>
</div>
<div style='height:30px'></div>
]]></content:encoded>
					
		
		<enclosure url="https://cdn.ainative.foundation/huggingface/20260514233037843.mp4" length="0" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/huggingface/20260514233413298.mp4" length="0" type="video/mp4" />

			</item>
		<item>
		<title>AI Native Daily Paper Digest &#8211; 20260512</title>
		<link>https://ainativefoundation.org/ai-native-daily-paper-digest-20260512/</link>
		
		<dc:creator><![CDATA[insights]]></dc:creator>
		<pubDate>Wed, 13 May 2026 00:42:12 +0000</pubDate>
				<category><![CDATA[Papers]]></category>
		<guid isPermaLink="false">https://ainativefoundation.org/ai-native-daily-paper-digest-20260512/</guid>

					<description><![CDATA[1. Qwen-Image-2.0 Technical Report 🔑 Keywords: Qwen-Image-2.0, high-fidelity synthesis, precise image editing, Multimodal Diffusion Transformer 💡 Category: Generative Models 🌟 Research Objective: [&#8230;]]]></description>
										<content:encoded><![CDATA[<h3>1. Qwen-Image-2.0 Technical Report</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Qwen-Image-2.0, high-fidelity synthesis, precise image editing, Multimodal Diffusion Transformer</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce Qwen-Image-2.0, unifying high-fidelity image generation and editing within a single framework, addressing challenges in ultra-long text rendering and multilingual typography.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Incorporates Qwen3-VL as a condition encoder and a Multimodal Diffusion Transformer for joint modeling, supported by large-scale data curation and a customized multi-stage training pipeline.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Qwen-Image-2.0 improves multilingual text fidelity, typography, and photorealistic generation, significantly outperforming previous models in generation and editing capabilities, enhancing its reliability and practicality as an image generation foundation model.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10730" target="_blank">https://huggingface.co/papers/2605.10730</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233009530.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>2. CollabVR: Collaborative Video Reasoning with Vision-Language and Video Generation Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Vision-Language Models, Video Generation Models, Closed-loop Framework, Visual Reasoning, Step-level Granularity</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Propose a closed-loop framework, CollabVR, to enhance visual reasoning in video generation by integrating vision-language models with video generation models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The framework couples VLMs and VGMs at step-level granularity, enabling real-time failure detection and correction to improve performance on visual reasoning tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; CollabVR demonstrates significant improvements over existing VGMs on benchmark tasks, particularly in challenging scenarios, and enhances models even further when combined with reasoning fine-tuning.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.08735" target="_blank">https://huggingface.co/papers/2605.08735</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233036226.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>3. PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Visual Typesetting Optimization, vision-in-the-loop agent, document automation, LaTeX, PaperFit</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to address the transformation of error-free compilable LaTeX documents into visually polished and publication-ready PDFs using Visual Typesetting Optimization (VTO).</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduces PaperFit, a vision-in-the-loop agent that iteratively renders pages, diagnoses layout defects, and applies constrained repairs through iterative visual verification and source-level revision.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Extensive experiments show PaperFit significantly outperforms baselines, establishing vision-in-the-loop optimization as a necessary stage in the document automation pipeline for converting compilable sources into professional-ready PDFs.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10341" target="_blank">https://huggingface.co/papers/2605.10341</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233059661.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>4. WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: WorldReasonBench, WorldRewardBench, video generation, world simulators, reasoning quality</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce WorldReasonBench and WorldRewardBench as benchmarks to evaluate video generation models&#8217; reasoning abilities about world-state evolution.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Develop WorldReasonBench with 436 test cases and structured QA annotations covering multiple reasoning dimensions.</p>
<p>   &#8211; Implement a two-part methodology for evaluating generated videos based on reasoning verification and quality assessment.</p>
<p>   &#8211; Introduce WorldRewardBench for preference benchmark with expert-annotated pairs to support reward-model evaluation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Despite improvements in commercial video generators, a gap remains between visual plausibility and true world reasoning capabilities.</p>
<p>   &#8211; Benchmarks reveal that videos can appear convincing yet fail in dynamics, causality, or information preservation.</p>
<p>   &#8211; The research supports community development of world-aware video generation models with released benchmarks and toolkits.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10434" target="_blank">https://huggingface.co/papers/2605.10434</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233125164.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>5. Model Merging Scaling Laws in Large Language Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Empirical scaling laws, language model merging, cross-entropy, power law, predictive planning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To identify and explore empirical scaling laws for language model merging, establishing power-law relationships between model size, expert count, and cross-entropy performance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Analyzing the relationship between model size and expert number using power laws across diverse architectures and methods such as Average, TA, TIES, DARE.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; A compact power law was identified that links model size and expert count, revealing diminishing returns in expert addition. This enables predictive planning to optimize model composition and transform heuristic merging practices into computationally efficient strategies, suggesting a new scaling principle for distributed generative AI.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2509.24244" target="_blank">https://huggingface.co/papers/2509.24244</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233151428.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>6. Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: MELT, reasoning depth, memory consumption, learnable gating mechanism, iterative reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce MELT, a novel recurrent LLM architecture that decouples reasoning depth from memory consumption, to enable scalable and efficient reasoning operations.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Employ a single Key-Value (KV) cache shared across reasoning loops with updates via a learnable gating mechanism.</p>
<p>   &#8211; Utilize chunk-wise training in two phases: interpolated transition and attention-aligned distillation from the LoopLM starting model to MELT.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; MELT significantly reduces the memory footprint compared to Ouro while maintaining comparable performance, achieving constant-memory iterative reasoning effectively.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.07721" target="_blank">https://huggingface.co/papers/2605.07721</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233221466.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>7. G-Zero: Self-Play for Open-Ended Generation from Zero Data</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: G-Zero, Hint-δ, intrinsic reward, self-evolving LLMs, proxy LLM judges</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce a verifier-free, co-evolutionary framework named G-Zero that enables autonomous self-improvement of large language models (LLMs) in unverifiable domains.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Develop the Hint-δ mechanism to provide intrinsic rewards by quantifying predictive shifts in LLM responses.</p>
<p>   &#8211; Implement a Proposer model using GRPO to target the Generator model&#8217;s blind spots with challenging queries and informative hints.</p>
<p>   &#8211; Optimize the Generator model through hint-guided improvements using DPO to ensure continuous self-evolution.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Proved a best-iterate suboptimality guarantee for G-Zero&#8217;s idealized version, contingent on sufficient exploration coverage by the Proposer and low pseudo-label score noise via data filtering.</p>
<p>   &#8211; By leveraging internal distributional dynamics, G-Zero circumvents external judge capability limitations, offering a scalable pathway for LLMs&#8217; continuous evolution in unverifiable environments.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.09959" target="_blank">https://huggingface.co/papers/2605.09959</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233316810.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>8. Pixal3D: Pixel-Aligned 3D Generation from Images</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Pixel-aligned 3D generation, 3D-native generators, Fidelity, Back-projection conditioning, Multi-view generation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper introduces Pixal3D, a new pixel-aligned approach aimed at improving high-fidelity 3D asset creation from images by addressing fidelity issues arising from implicit 2D-3D correspondence problems.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The method involves a pixel back-projection conditioning scheme that lifts multi-scale image features into a 3D feature volume, creating direct pixel-to-3D correspondence to maintain consistency with the input view and extend to multi-view generation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Pixal3D demonstrates scalable, high-quality 3D asset generation with substantial fidelity improvements, enabling 3D-native pixel-aligned generation, which benefits high-fidelity scene synthesis from single or multi-view images.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10922" target="_blank">https://huggingface.co/papers/2605.10922</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233250150.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>9. Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Self-distillation, Reinforcement Learning, Information Asymmetry, Exploration, RLRT</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to enhance self-distillation in reinforcement learning by utilizing successful student decisions that diverge from teacher predictions for more effective exploration.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; RLRT is proposed as a method that reinforces successful student decisions that differ from teacher predictions, building on the original self-distillation signal by reversing it.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; RLRT significantly outperforms traditional self-distillation and exploration-based baselines by establishing information asymmetry as a new and principled design axis for reinforcement learning.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10781" target="_blank">https://huggingface.co/papers/2605.10781</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233343227.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>10. Prompt-Activation Duality: Improving Activation Steering via Attention-Level Interventions</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Activation steering, KV-cache contamination, GCAD, coherence drift, token-level gating</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to address the challenge of KV-cache contamination in language models during dialogue settings, improving long-horizon coherence by proposing the Gated Cropped Attention-Delta steering (GCAD) technique.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The researchers developed GCAD, which extracts steering signals from prompt contributions to self-attention and applies them with token-level gating to enhance coherence in language models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; GCAD significantly improves coherence drift and trait expression in multi-turn benchmarks, suggesting that activation steering becomes more reliable when aligned with prompt-mediated pathways used for behavioral control.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10664" target="_blank">https://huggingface.co/papers/2605.10664</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233438106.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>11. Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: global retention, KV eviction, memory budget, lightweight retention gates, attention dilution</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Knowledge Representation and Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Improve long-context reasoning by selectively retaining useful tokens while reducing memory usage.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduce a global retention-based key-value (KV) eviction method using lightweight retention gates to assign utility scores, with a shared final scoring projection for calibration across layers and heads.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The new method substantially reduces KV memory while maintaining or improving performance in long-context language, vision-language reasoning, and multi-turn dialogue benchmarks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.09649" target="_blank">https://huggingface.co/papers/2605.09649</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233411309.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>12. SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Structured Pruning, Knowledge Distillation, Mixture-of-Experts, Pretraining Scale, Progressive Pruning Schedule</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To systematically study the compression of mixture-of-experts models during large-scale pretraining.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Application of structured pruning and knowledge distillation techniques, evaluation of initialization versus training from scratch, investigation of expert compression methods, and introduction of a partial-preservation expert merging strategy.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Pruning pretrained mixture-of-experts models consistently outperforms training from scratch.</p>
<p>   &#8211; Different one-shot expert compression methods reach similar performance levels after extensive pretraining.</p>
<p>   &#8211; Combining knowledge distillation with language modeling loss improves performance, especially on knowledge-intensive tasks, and introduces multi-token prediction distillation.</p>
<p>   &#8211; Progressive pruning schedules lead to better optimization trajectories compared to one-shot compression.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.08738" target="_blank">https://huggingface.co/papers/2605.08738</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233504276.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>13. NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AI Native, personalization, multi-agent framework, research automation, procedural knowledge</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to improve research automation by developing NanoResearch, a multi-agent framework that personalizes assistance through accumulated skills, user-specific experience, and internalized implicit preferences.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; NanoResearch employs a tri-level co-evolution approach with a skill bank for procedural knowledge, a memory module for user-specific experience, and label-free policy learning for implicit preferences adaptation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; NanoResearch demonstrates significant enhancements over existing AI research systems, offering better research outputs at reduced costs via its iterative refinement process.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10813" target="_blank">https://huggingface.co/papers/2605.10813</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233556871.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>14. Shaping Schema via Language Representation as the Next Frontier for LLM Intelligence Expanding</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Language Representation, Large Language Models, Knowledge Activation, Schema, AI Native</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to enhance the intelligence of Large Language Models by focusing on advanced language representation design without scaling or parameter modifications.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The authors conducted a review of recent empirical practices and emerging methodologies, alongside controlled experiments to demonstrate the effects of different language representations on LLM performance and internal feature activations.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Findings suggest significant performance gains in LLMs can be achieved through deliberate language representation design, highlighting it as a promising direction for future research.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.09271" target="_blank">https://huggingface.co/papers/2605.09271</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233531783.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>15. Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Entrocraft, Rejection-sampling, Entropy schedule, Performance saturation, Generalization</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study introduces Entrocraft, aiming to address performance saturation in large language models by customizing entropy schedules and enhancing generalization and training longevity.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; It utilizes a rejection-sampling approach which biases advantage distributions and does not require regularization, theoretically linking per-step entropy changes to advantage distribution.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Entrocraft significantly improves the generalization, output diversity, and longevity of training in reinforcement learning; remarkably, it enabled a 4B model to outperform an 8B baseline and sustained improvement four times longer than existing benchmarks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.26326" target="_blank">https://huggingface.co/papers/2604.26326</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233626239.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>16. DeltaRubric: Generative Multimodal Reward Modeling via Joint Planning and Verification</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: DeltaRubric, Multimodal Large Language Models, Disagreement Planner, Checklist Verifier, Visual Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to improve reward modeling reliability for multimodal large language models by introducing a dynamic, two-step evaluation approach called DeltaRubric.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; DeltaRubric utilizes a plan-and-execute process with a single MLLM, where it first acts as a Disagreement Planner to create instance-specific verification checklists, and then as a Checklist Verifier to execute these checks and produce grounded judgments.</p>
<p>   &#8211; Formulated as a multi-role reinforcement learning problem to optimize planning and verification.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; DeltaRubric significantly enhances the performance of MLLMs in multimodal preference evaluation, with empirical gains demonstrated on Qwen3-VL models, improving accuracy on VL-RewardBench by 22.6 and 18.8 points for different model sizes.</p>
<p>   &#8211; Decomposing evaluation into structured, verifiable steps leads to more reliable and generalizable reward modeling.  </p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.09269" target="_blank">https://huggingface.co/papers/2605.09269</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233653228.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>17. ELF: Embedded Language Flows</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Continuous embedding space, Diffusion language models, AI-generated summary, Classifier-free guidance, Token space</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To investigate the effectiveness of continuous diffusion models in language processing by operating in embedding space rather than discrete token space.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduction of Embedded Language Flows (ELF), a class of diffusion models based on continuous-time flow matching, which predominantly remain in continuous embedding space until mapping to discrete tokens using a shared-weight network.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; ELF significantly surpasses existing discrete and continuous diffusion language models in terms of generation quality, requiring fewer sampling steps, and demonstrates effective adaptation of image-domain techniques to language modeling.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10938" target="_blank">https://huggingface.co/papers/2605.10938</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233744101.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>18. A Single Neuron Is Sufficient to Bypass Safety Alignment in Large Language Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Safety alignment, Refusal neurons, Harmful knowledge, Concept neurons, AI-generated summary</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To investigate the mechanisms of safety alignment in language models, focusing on how specific neurons control harmful knowledge expression and refusal behavior.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Analyzing the role of individual neurons across seven models spanning two families and 1.7B to 70B parameters, using neither additional training nor prompt engineering to test their influence on safety measures.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The research concludes that safety alignment is mediated by specific neurons, and that either suppression or activation of identified neurons can bypass safety measures, suggesting a concentration of safety control within individual neurons.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.08513" target="_blank">https://huggingface.co/papers/2605.08513</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233718839.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>19. Can Muon Fine-tune Adam-Pretrained Models?</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Optimizer mismatch, Adam, Muon, Fine-tuning, LoRA</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To investigate the optimizer mismatch between Adam and Muon during fine-tuning and its impact on performance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Conducted controlled experiments to study the distinct implicit biases of Adam and Muon that lead to performance degradation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The optimizer mismatch disrupts pretrained knowledge, with the severity correlating with the update strength. Methods like LoRA can effectively mitigate this mismatch, reducing the performance gap across language and vision tasks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10468" target="_blank">https://huggingface.co/papers/2605.10468</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233811757.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>20. Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: On-policy distillation, Per-token supervision, Teacher model, Self-distillation, Gradient alignment score</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Foundations of AI</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to identify optimal teacher models and contexts for reasoning model training through a training-free diagnostic framework.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduces a diagnostic framework analyzing per-token distillation signals without the need for costly training, using an ideal per-node gradient and a targeted-rollout algorithm to efficiently estimate gradient alignment scores.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Findings indicate distillation guidance aligns more with the ideal on incorrect rollouts, emphasizing the importance of per-task, per-token analyses as no universal distillation context is effective across scenarios.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10889" target="_blank">https://huggingface.co/papers/2605.10889</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233835335.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>21. Crosslingual On-Policy Self-Distillation for Multilingual Reasoning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Large language models, mathematical reasoning, low-resource languages, self-distillation, Crosslingual On-Policy Self-Distillation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective: </p>
<p>   &#8211; The paper aims to enhance the mathematical reasoning abilities of low-resource languages using the COPSD method by transferring reasoning behaviors from high-resource language models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The method employs Crosslingual On-Policy Self-Distillation, where the same model is used as both student and teacher. The student works with low-resource problems, while the teacher has access to crosslingual contexts, minimizing token-level divergence in the learning process.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; COPSD demonstrates consistent improvements in mathematical reasoning across various model sizes in 17 low-resource African languages. It outperforms existing methods like Group Relative Policy Optimization and enhances answer format adherence, test-time scaling, and generalizes well to complex multilingual benchmarks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.09548" target="_blank">https://huggingface.co/papers/2605.09548</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233957969.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>22. DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: DECO, Mixture-of-Experts, sparse MoE, dense Transformers, ReLU-based routing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to develop DECO, a sparse Mixture-of-Experts architecture, to achieve the performance of dense Transformers while reducing computational and storage overhead.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; DECO employs techniques such as ReLU-based routing with learnable expert-wise scaling, NormSiLU activation function, and the use of non-gated MLP experts to optimize performance and efficiency.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; DECO activates only 20% of experts, yet matches the performance of dense models and surpasses existing MoE baselines. The approach offers a 3.00 times speedup in processing on actual hardware.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10933" target="_blank">https://huggingface.co/papers/2605.10933</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233931750.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>23. PlantMarkerBench: A Multi-Species Benchmark for Evidence-Grounded Plant Marker Reasoning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: PlantMarkerBench, AI-assisted plant biology, literature-grounded biological evidence, Open-weight models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce PlantMarkerBench as a multi-species benchmark for evaluating literature-based plant marker evidence interpretation and categorization across four species.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilized a modular curation pipeline incorporating large-scale literature retrieval, hybrid search, species-aware biological grounding, structured evidence extraction, and human review.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Although frontier models show strong performance on direct expression evidence, they struggle with functional, indirect, and weak-support evidence, and encounter challenges with evidence-type confusion and false-positive rates in ambiguous contexts.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10032" target="_blank">https://huggingface.co/papers/2605.10032</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233904412.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>24. Dystruct: Dynamically Structured Diffusion Language Model Decoding via Bayesian Inference</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Diffusion language models, Bayesian structured decoding, flexible-length generation, dynamic structural inference, parallel decoding</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To propose a training-free, Bayesian structured decoding framework that allows for flexible-length generation in diffusion language models without the need for retraining.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Employs dynamic structural inference to simultaneously compute expansion length, block boundaries, and decoding schedule by integrating local uncertainty with structural signals.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed approach significantly improves the generation quality and flexibility of text over existing fixed-length and flexible-length baselines, highlighting the effectiveness of Bayesian structured decoding for structured text generation.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.09820" target="_blank">https://huggingface.co/papers/2605.09820</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512234024480.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>25. LLiMba: Sardinian on a Single GPU &#8212; Adapting a 3B Language Model to a Vanishing Romance Language</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Sardinian language model, AI-generated summary, continued pretraining, supervised fine-tuning, LoRA</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Develop a 3-billion-parameter Sardinian language model using limited computational resources to improve translation tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilized continued pretraining (CPT) and supervised fine-tuning (SFT) on a 24 GB consumer GPU, leveraging a corpus of 11.5 million Sardinian tokens and 2.4 million related Romance tokens.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The model reached superior perplexity scores and outperformed baseline models in translation tasks. The rsLoRA r256 configuration achieved the best performance, indicating that adapter capacity significantly influences results. Stronger regularization was not always beneficial, and translation metrics provided a clear ordering among varying qualitative behaviors.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.09015" target="_blank">https://huggingface.co/papers/2605.09015</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512234119651.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>26. Sub-JEPA: Subspace Gaussian Regularization for Stable End-to-End World Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Joint-Embedding Predictive Architectures, Gaussian constraints, bias-variance tradeoff, latent representations, random subspaces</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To improve the training of Joint-Embedding Predictive Architectures (JEPA) by applying Gaussian constraints in multiple random subspaces to achieve better bias-variance balance in continuous-control environments.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Applied Gaussian constraints within multiple random subspaces instead of the original embedding space, aiming for improved training stability and representation flexibility by relaxing global constraints.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed method outperforms the recent LeWorldModel (LeWM) in continuous-control environments, providing a strong baseline for future JEPA-based world model research.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.09241" target="_blank">https://huggingface.co/papers/2605.09241</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512234050517.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>27. InfoLaw: Information Scaling Laws for Large Language Models with Quality-Weighted Mixture Data and Repetition</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: InfoLaw, data-aware scaling framework, model loss, data mixture weights, repetition</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To introduce InfoLaw, a data-aware scaling framework that efficiently predicts model loss based on token consumption, model size, data mixture weights, and repetition.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Collected performance data by training models on datasets with varying scale, quality distribution, and repetition levels.</p>
<p>   &#8211; Developed a pretraining model where quality controls information density and repetition affects diminishing returns.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; InfoLaw accurately predicts performance on unseen data recipes with minimal error and extrapolates reliably across various training scales, enabling efficient data-recipe selection under differing compute budgets.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.02364" target="_blank">https://huggingface.co/papers/2605.02364</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512234508159.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>28. A Closed-Form Upper Bound for Admissible Learning-Rate Steps in Belief-Space Dynamics</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Learning-rate steps, local beliefspace calculation, projected forward step, probability simplex, contractivity</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Foundations of AI</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study investigates the admissibility of learning-rate steps using a framework characterized by contractivity in KL/Bregman geometry, aiming to provide an upper bound as a formula rather than as a hyperparameter.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The research treats learning-rate steps as steps on the probability simplex, focusing on a localized calculation termed local beliefspace calculation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The paper concludes that under this model, the upper bound for an admissible learning-rate step is determined formulaically, highlighting a novel approach to parameter tuning in machine learning.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06741" target="_blank">https://huggingface.co/papers/2605.06741</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512234443152.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>29. 100,000+ Movie Reviews from Kazakhstan: Russian, Kazakh, and Code-Switched Texts</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Multilingual Dataset, Sentiment Polarity, Polarity Classification, Multilingual Transformer Models, Class Imbalance</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce a multilingual movie review dataset from Kazakhstan, annotated for language and sentiment sentiments.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilized classical Bag of Words (BoW) and TF-IDF baselines, compared against multilingual transformer models such as mBERT, XLM-RoBERTa, and RemBERT for sentiment polarity and score classification tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Transformer models outperformed classical baselines in polarity classification, yet score classification remains challenging due to class imbalance and subtle rating distinctions.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.08600" target="_blank">https://huggingface.co/papers/2605.08600</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512234409028.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>30. Shepherd: A Runtime Substrate Empowering Meta-Agents with a Formalized Execution Trace</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Shepherd, Functional Programming, Meta-Agent, Lean, Git-like Execution Trace</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce Shepherd as an efficient infrastructure for programming meta-agents.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Formalizes meta-agent operations using functional programming and Lean.</p>
<p>   &#8211; Uses Git-like execution trace for recording interactions and fast process forking.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Demonstrates runtime intervention, counterfactual meta-optimization, and Tree-RL training, significantly improving benchmark performances.</p>
<p>   &#8211; Open-sources the system to foster future research and innovations.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10913" target="_blank">https://huggingface.co/papers/2605.10913</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512234340408.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>31. RoboMemArena: A Comprehensive and Challenging Robotic Memory Benchmark</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: RoboMemArena, PrediMem, vision-language model, memory management, real-world evaluation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to address current limitations in robotic memory benchmarks by introducing RoboMemArena, which features a comprehensive set of 26 tasks and emphasizes real-world evaluation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes a vision-language model (VLM) to design subtasks and compose task trajectories, while providing extensive memory-related annotations and enabling physical evaluation with real-world memory tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The newly developed PrediMem system, which incorporates a dual-system vision-language architecture and predictive coding, exceeds existing baselines in managing complex memory systems and provides valuable insights into memory management and model scalability.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10921" target="_blank">https://huggingface.co/papers/2605.10921</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512234312553.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>32. Rethinking Agentic Search with Pi-Serini: Is Lexical Retrieval Sufficient?</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Lexical Retriever, Large Language Models, Pi-Serini, BM25, Retrieval Depth</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To evaluate the effectiveness of lexical retrievers compared to dense retrievers in deep research tasks when paired with advanced Large Language Models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Pairing the BM25 lexical retriever with frontier LLMs like gpt-5.5 to assess performance on tasks like BrowseComp-Plus, supported by the search agent Pi-Serini.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Lexical retrievers, when effectively configured for retrieval depth, can outperform dense retrievers in terms of answer accuracy and evidence recall.</p>
<p>   &#8211; Pi-Serini achieved 83.1% answer accuracy and 94.7% surfaced evidence recall.</p>
<p>   &#8211; Tuning BM25 and increasing retrieval depth significantly improves performance metrics.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10848" target="_blank">https://huggingface.co/papers/2605.10848</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512234245360.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>33. SplatWeaver: Learning to Allocate Gaussian Primitives for Generalizable Novel View Synthesis</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: novel view synthesis, Gaussian primitives, feed-forward approach, high-frequency prior, pixel-level routing scheme</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To improve rendering quality in novel view synthesis by dynamically allocating 3D Gaussian primitives based on spatial complexity.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduction of SplatWeaver, which employs cardinality Gaussian experts and a pixel-level routing scheme to determine adaptive allocation of Gaussian primitives.</p>
<p>   &#8211; Utilization of a high-frequency prior with a guidance module and routing regularization for stability and complexity-aware allocation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; SplatWeaver consistently outperforms existing methods, delivering more accurate novel-view renderings with fewer Gaussian primitives by allocating resources based on scene complexity.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.07287" target="_blank">https://huggingface.co/papers/2605.07287</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512234214263.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>34. jina-embeddings-v5-omni: Text-Geometry-Preserving Multimodal Embeddings via Frozen-Tower Composition</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: frozen-encoder model composition, multimodal embedding, semantic embedding space, Jina Embeddings v5, language model</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce a novel approach called frozen-encoder model composition to enhance multimodal embedding efficiency while maintaining text embedding consistency.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilize VLM-style architecture and extend Jina Embeddings v5 Text models by adding specialized encoders for images and audio, with only the connecting components being trained.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; This method efficiently produces competitive results against state-of-the-art multimodal embedding models, with nearly equal performance despite significantly reduced training requirements.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.08384" target="_blank">https://huggingface.co/papers/2605.08384</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512234147861.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>35. </h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="" target="_blank"></a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/thirteen/202605121778629516.jpg"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>36. Training-Free Dense Hand Contact Estimation with Multi-Modal Large Language Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: zero-shot, dense hand contact estimation, multi-modal large language models, 3D hand geometry, multi-stage contact reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to develop ContactPrompt, a zero-shot approach for dense hand contact estimation using multi-modal large language models (MLLMs) that addresses the challenges of 3D hand geometry encoding and fine-grained contact prediction.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduces a detailed hand-part segmentation and vertex-grid representation to effectively encode 3D hand geometry using MLLMs.</p>
<p>   &#8211; Develops multi-stage structured contact reasoning to bridge global semantics and fine-grained geometry progressively.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; ContactPrompt enables precise dense hand contact estimation leveraging MLLMs&#8217; reasoning capabilities without requiring training, outperforming prior supervised methods on large-scale dense contact datasets.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.05886" target="_blank">https://huggingface.co/papers/2605.05886</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512234455739.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>37. Safe, or Simply Incapable? Rethinking Safety Evaluation for Phone-Use Agents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AI Native, capability signal, safety signal</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Human-AI Interaction</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Examine the distinction between safety outcomes and capability signals in phone-use agents during critical moments.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; PhoneSafety benchmark of 700 safety-critical situations, assessing agents&#8217; actions in real-world phone interactions across more than 130 apps.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Strong phone-use ability does not ensure safer choices, as models may still fail to act correctly in risky moments.</p>
<p>   &#8211; Failures often indicate a capability issue, where agents either make unsafe choices or fail to act in challenging environments.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.07630" target="_blank">https://huggingface.co/papers/2605.07630</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512234427504.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>38. TD3B: Transition-Directed Discrete Diffusion for Allosteric Binder Generation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: TD3B, AI-generated summary, generative framework, directional transition control, G protein-coupled receptors</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce TD3B to design allosteric binders with specified agonist or antagonist behavior through directional transition control, addressing gaps in existing structure-based methods.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilize a target-aware Direction Oracle, a soft binding-affinity gate, and amortized fine-tuning of a pre-trained discrete diffusion model to generate targeted agonists and antagonists.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; TD3B successfully decouples agonist and antagonist generation from binding affinity, overcoming limitations of equilibrium-based or inference-only models, thereby enhancing therapeutic efficacy for GPCRs.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.09810" target="_blank">https://huggingface.co/papers/2605.09810</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512234354642.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>39. The Alpha Blending Hypothesis: Compositing Shortcut in Deepfake Detection</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Deepfake detection, Alpha Blending Hypothesis, Cross-dataset generalization, Self-blended images, Compositional deepfake datasets</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To explore the underlying mechanisms of deepfake detection methods and introduce the Alpha Blending Hypothesis.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Proposed the method BlenD, which utilizes a large-scale dataset of real-only facial images augmented with self-blended images for enhanced cross-dataset generalization.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Demonstrated that deepfake detectors are effective in identifying compositing artifacts with high sensitivity to self-blended images, achieving state-of-the-art AUROC of 94.0% when combined in an ensemble configuration.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10334" target="_blank">https://huggingface.co/papers/2605.10334</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512234325753.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>40. CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: capability vectors, auxiliary training objectives, standard supervised finetuning, orthogonal regularization, meta model</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To enhance model capabilities and reduce computational overhead during standard supervised finetuning by decoupling auxiliary training objectives.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The introduction of capability vector merging and a lightweight orthogonal regularization loss to form capability-enhanced meta models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed method achieves improved performance and reduced computational requirements, demonstrating the versatility and effectiveness of capability vectors across diverse models and environments.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10903" target="_blank">https://huggingface.co/papers/2605.10903</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512234259442.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>41. Uncovering Entity Identity Confusion in Multimodal Knowledge Editing</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Multimodal Knowledge Editing, Entity Identity Confusion, Image-Entity Binding, EC-Bench, Vision-Language Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to address systemic failure modes in multimodal knowledge editing by identifying and mitigating Entity Identity Confusion in large vision-language models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The researchers developed EC-Bench, a diagnostic benchmark to analyze changes in image-entity bindings before and after model editing.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The study concludes that Entity Identity Confusion arises from inadequate differentiation between Image-Entity and Entity-Entity knowledge, leading to incorrect label associations. Strategies that focus on constraining edits to the model&#8217;s Image-Entity binding process can significantly reduce this confusion.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06096" target="_blank">https://huggingface.co/papers/2605.06096</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512234229253.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>42. Queryable LoRA: Instruction-Regularized Routing Over Shared Low-Rank Update Atoms</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: data-adaptive method, low-rank adaptation, attention-based routing, parameter-efficient fine-tuning, instruction-regularization </p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop a data-adaptive method for parameter-efficient fine-tuning of large neural networks that combines scalability with dynamic, context-sensitive updates.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; A shared queryable memory of low-rank update atoms is used with attention-based routing to dynamically adapt layer updates. Incorporates instruction-regularization to bias updates semantically using a language-induced prior.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; This approach improves test performance and training stability on tasks such as noisy non-linear regression and LLM fine-tuning, maintaining efficiency similar to standard low-rank adaptation.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.08423" target="_blank">https://huggingface.co/papers/2605.08423</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512234201521.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>43. Pushing Biomolecular Utility-Diversity Frontiers with Supergroup Relative Policy Optimization</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: SGRPO, Biomolecular Generation, Diversity Rewards, Utility-Diversity Pareto Frontier</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To enhance biomolecular generation by introducing a framework, SGRPO, that incorporates set-level diversity rewards to improve utility and diversity across multiple design tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The development and evaluation of the Supergroup Relative Policy Optimization (SGRPO) framework, which decouples from specific generators and measures, allowing for instantiation with GRPO-style approaches. The framework applies to various molecular design tasks using autoregressive and discrete diffusion generators.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; SGRPO effectively expands the utility-diversity Pareto frontier, achieving superior metrics compared to pretrained and other GRPO models. Direct set-level diversity rewards maintain effectiveness with small groups, preserving broad distribution coverage post-training.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.08659" target="_blank">https://huggingface.co/papers/2605.08659</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512234133060.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>44. FORTIS: Benchmarking Over-Privilege in Agent Skills</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Large language model agents, skill layer, privilege boundary, over-privilege, privilege escalation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to examine the behavior of Large language model agents in relation to privilege boundaries and their tendency to exceed necessary privileges during skill execution.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduction of FORTIS, a benchmark used to evaluate over-privilege behavior in agent skills, assessing whether a model selects the minimally sufficient skill and executes it without overstepping its boundaries.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Findings reveal that over-privilege is common in current models, where they often opt for higher-privilege skills than required. This tendency is exacerbated under realistic user interaction conditions, indicating that the skill layer may contribute to privilege escalation.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.09163" target="_blank">https://huggingface.co/papers/2605.09163</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512234103290.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>45. SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: SimWorld Studio, Unreal Engine 5, embodied agents, 3D environments, self-evolution</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To introduce SimWorld Studio, an open-source platform designed to generate evolving 3D environments using Unreal Engine 5 and SimCoder for embodied agent training.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilization of SimCoder, a skill-augmented coding agent, to create physically grounded 3D worlds based on language/image instructions and feedback-driven self-evolution mechanisms.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The generated environments significantly enhance embodied agent performance and adaptability, achieving notable success-rate improvements when compared to static and untrained scenarios.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.09423" target="_blank">https://huggingface.co/papers/2605.09423</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512234037315.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>46. Scratchpad Patching: Decoupling Compute from Patch Size in Byte-Level Language Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Tokenizer-free language models, Patch-based approaches, Patch lag, Scratchpad Patching, Next-byte prediction entropy</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; This research aims to address the trade-off between compute efficiency and modeling quality in tokenizer-free language models with patch-based approaches by introducing Scratchpad Patching.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The introduction of transient scratchpads within each patch, using next-byte prediction entropy to allocate compute resources dynamically, enhances modeling quality while maintaining efficiency in natural language and code experiments.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Scratchpad Patching improves the model&#8217;s quality at the same patch size, significantly reducing KV-cache and inference compute footprint, and effectively matches or approaches byte-level baseline performance.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.09630" target="_blank">https://huggingface.co/papers/2605.09630</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512234010947.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>47. Injecting Distributional Awareness into MLLMs via Reinforcement Learning for Deep Imbalanced Regression</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Multimodal large language models, Reinforcement learning, Numerical regression, Long-tailed distributions, Batch-level comparison</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to improve the numerical regression performance of Multimodal Large Language Models (MLLMs) when dealing with long-tailed target distributions.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The authors propose a distribution-aware reinforcement learning framework using Group Relative Policy Optimization, incorporating batch-level comparison-based supervision to enhance prediction accuracy.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed framework consistently outperforms traditional supervised fine-tuning and existing MLLM regression methods, particularly in medium- and few-shot regimes, by effectively aligning predicted and actual distributions.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.01402" target="_blank">https://huggingface.co/papers/2605.01402</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233944438.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>48. DeepRefine: Agent-Compiled Knowledge Refinement via Reinforcement Learning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: DeepRefine, Agent-compiled knowledge bases, Multi-turn interactions, Reinforcement learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Knowledge Representation and Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to enhance the quality of agent-compiled knowledge bases for better performance in open-ended, knowledge-intensive tasks by using a reasoning model called DeepRefine.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study utilizes multi-turn interactions and abductive diagnosis to identify defects in knowledge bases, employing targeted updates and reinforcement learning with a Gain-Beyond-Draft (GBD) reward system for optimization.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; DeepRefine demonstrates consistent improvements in downstream task performance compared to strong baseline methods.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10488" target="_blank">https://huggingface.co/papers/2605.10488</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233918098.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>49. GridProbe: Posterior-Probing for Adaptive Test-Time Compute in Long-Video VLMs</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: GridProbe, VLMs, attention cost, Shape-Adaptive Selection, interpretability</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To enhance long-video understanding in Visual Language Models (VLMs) by reducing attention costs and maintaining accuracy through a novel frame selection method.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduction of GridProbe, a training-free posterior-probing inference paradigm that uses a frozen VLM&#8217;s reasoning to adaptively select frames, minimizing attention costs with interpretability provided by importance maps.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; GridProbe achieves sub-quadratic attention cost with minimal accuracy loss, significantly reducing computational requirements (up to 3.36x TFLOPs reduction) in comparison to the monolithic baseline, proving effective in various benchmarks without the need for retraining.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10762" target="_blank">https://huggingface.co/papers/2605.10762</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233848706.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>50. MuSS: A Large-Scale Dataset and Cinematic Narrative Benchmark for Multi-Shot Subject-to-Video Generation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: MuSS, multi-shot video generation, narrative logic, Subject-to-Video, cross-shot matching</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective is to address narrative logic, spatiotemporal alignment, and &#8220;copy-paste&#8221; issues in multi-shot and Subject-to-Video generation through a novel dataset and mechanisms to enhance cinematic storytelling in AI-generated videos.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduction of MuSS, a large-scale, dual-track dataset sourced from over 3,000 movies to support multi-shot transitions and subject-centric narratives using a progressive captioning pipeline and a cross-shot matching mechanism.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The study demonstrates that the MuSS-augmented model achieves superior narrative effectiveness and identity preservation compared to current baselines, which struggle with continuous narrative logic and often result in simplistic 2D outputs.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.23789" target="_blank">https://huggingface.co/papers/2604.23789</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233822991.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>51. Metal-Sci: A Scientific Compute Benchmark for Evolutionary LLM Kernel Search on Apple Silicon</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Apple Silicon, optimization regimes, evolutionary loop, frozen LLM, silent regression</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study introduces Metal-Sci, a benchmark for scientific computing kernels on Apple Silicon, aiming to optimize performance through an evolutionary loop involving a large language model.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The approach employs a lightweight automatic kernel search framework that compiles candidates, evaluates them against a fitness function, and integrates structured diagnostics into a frozen large language model to enhance performance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The study demonstrates significant speedups in computational tasks, highlighting structural methodologies like a held-out gate scoring function to identify and manage silent regressions that in-distribution scores might miss.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.09708" target="_blank">https://huggingface.co/papers/2605.09708</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233756044.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>52. FlashEvolve: Accelerating Agent Self-Evolution with Asynchronous Stage Orchestration</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: FlashEvolve, LLM-based evolution, asynchronous execution, artifact version tracking, language-space staleness</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To enhance LLM-based evolution frameworks by reducing computational bottlenecks while maintaining evolutionary quality through asynchronous execution and artifact version tracking.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implemented a framework called FlashEvolve that uses asynchronous workers and queues, along with policies to update or patch stale artifacts to handle data staleness.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; FlashEvolve significantly increases proposal throughput and token efficiency, improving performance by 3.5 times on local vLLM and 4.9 times on API serving workloads compared to synchronous systems.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.08520" target="_blank">https://huggingface.co/papers/2605.08520</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233730949.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>53. AgentForesight: Online Auditing for Early Failure Prediction in Multi-Agent Systems</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: multi-agent systems, long-horizon tasks, online auditing, risk-anticipation prior, step-level localization</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce AgentForesight to enable real-time error detection in multi-agent systems during trajectory execution.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilize online auditing with coarse-to-fine reinforcement learning to develop AgentForesight-7B, a compact online auditor.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; AgentForesight-7B achieves significant performance gains, outperforming leading proprietary models like GPT-4.1 and DeepSeek-V4-Pro in error detection and localization.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.08715" target="_blank">https://huggingface.co/papers/2605.08715</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233705755.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>54. Reinforcing Multimodal Reasoning Against Visual Degradation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: ROMA, Multimodal Large Language Models, visual degradation, Reinforcement Learning, policy collapse</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To enhance the robustness of Multimodal Large Language Models (MLLMs) against visual degradations using the ROMA RL fine-tuning framework.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Deployment of a dual-forward-pass strategy with teacher forcing to assess corrupted views and maintain clean-input performance.</p>
<p>   &#8211; Implementation of a token-level surrogate KL penalty to ensure distributional consistency and avoid worst-case scenario augmentations.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; ROMA successfully improves robustness by +2.4% on seen and +2.3% on unseen visual corruptions for MLLMs, maintaining accuracy on clean data.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.09262" target="_blank">https://huggingface.co/papers/2605.09262</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233639514.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>55. Omni-Persona: Systematic Benchmarking and Improving Omnimodal Personalization</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Omni-Persona, omnimodal personalization, Persona Modality Graph, Calibrated Accuracy, cross-modal routing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper introduces Omni-Persona, the first comprehensive benchmark for omnimodal personalization, aiming to unify text, image, and audio modalities in personalization research.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The approach involves formalizing the task as cross-modal routing over the Persona Modality Graph, encompassing 4 task groups and 18 fine-grained tasks, with the inclusion of the Calibrated Accuracy metric to evaluate grounding behaviors.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The study reveals a consistent audio-vs-visual grounding gap in open-source models, proposes Calibrated Accuracy as a separate evaluation axis, and demonstrates that SFT is limited by the scalability of annotated ground-truth supervision, while RLVR requires careful reward design to maintain generation quality.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.09996" target="_blank">https://huggingface.co/papers/2605.09996</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233610293.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>56. Conformal Agent Error Attribution</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Multi-agent systems, Error attribution, Conformal prediction, Sequential data, Model-agnostic</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to develop a conformal prediction framework for error attribution in multi-agent systems, enhancing automated recovery through precise error isolation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduces new filtration-based conformal prediction algorithms for sequential data, enabling prediction of contiguous sequences for efficient debugging and recovery in multi-agent systems.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The approach is validated across different agents and datasets, effectively isolating errors and assisting in rolling back multi-agent systems to correct states through prediction sets, while being model-agnostic.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06788" target="_blank">https://huggingface.co/papers/2605.06788</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233544018.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>57. Mela: Test-Time Memory Consolidation based on Transformation Hypothesis</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Memory-augmented Transformer, Hierarchical Memory Module, Multi-granularity Memory Representations, Memory Consolidation, Transformer-based Language Decoder</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To propose a new memory-augmented Transformer architecture, Mela, that enhances long-context language modeling through the integration of hierarchical memory modules inspired by neuroscientific theories.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilized Hierarchical Memory Module (HMM) within a Transformer-based language decoder to achieve online memory consolidation.</p>
<p>   &#8211; Incorporated MemStack method to effectively distribute multi-granularity memory representations across decoder layers without adding extra tokens.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Mela outperforms existing Transformer models in language modeling for all model sizes, maintaining performance on longer contexts beyond the training length.</p>
<p>   &#8211; Extensive ablation studies confirm the effectiveness and provide configuration guidance of the proposed components.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10537" target="_blank">https://huggingface.co/papers/2605.10537</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233519454.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>58. SlimSpec: Low-Rank Draft LM-Head for Accelerated Speculative Decoding</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: SlimSpec, speculative decoding, Large Language Models, low-rank parameterization, end-to-end speedup</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To improve speculative decoding efficiency by compressing the drafter&#8217;s language model head with low-rank parameterization, preserving full vocabulary support, and achieving significant speedup.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilized low-rank parameterization in the LM-head of a draft model to compress the inner representation rather than the output.</p>
<p>   &#8211; Evaluated using the EAGLE-3 drafter over three target models and multiple benchmarks to assess performance in latency- and throughput-bound inference scenarios.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; SlimSpec achieved 4-5 times acceleration over the standard LM-head architecture while maintaining competitive acceptance length.</p>
<p>   &#8211; Demonstrated up to 8-9% end-to-end speedup compared to existing methods, requiring minimal adjustments to training and inference pipelines.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10453" target="_blank">https://huggingface.co/papers/2605.10453</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233452558.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>59. RigidFormer: Learning Rigid Dynamics using Transformers</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: RigidFormer, mesh-free rigid-body dynamics, Transformer-based model, Anchor-based RoPE, permutation-equivariant</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop a RigidFormer model for simulating mesh-free rigid-body dynamics efficiently and accurately using object-centric processing.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes a Transformer-based approach with anchor-based attention mechanisms, Anchor-Vertex Pooling, and Anchor-based RoPE to enhance simulation fidelity while respecting the unordered structure of inputs.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; RigidFormer outperforms traditional mesh-based baselines in benchmark tests, efficiently scales to large numbers of objects, generalizes across unseen datasets, and offers preliminary extensions to articulated bodies.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.09196" target="_blank">https://huggingface.co/papers/2605.09196</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260512233424090.mp4"></video> </figure>
</p>
</div>
<div style='height:30px'></div>
<h3>60. LLaVA-UHD v4: What Makes Efficient Visual Encoding in MLLMs?</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Visual encoding, Multimodal Large Language Models, slice-based encoding, intra-ViT compression, high-resolution inputs</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective of the research is to enhance visual encoding efficiency for high-resolution inputs in Multimodal Large Language Models by exploring slice-based encoding and intra-ViT early compression.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study employs controlled experiments to compare slice-based encoding versus global encoding, devising a framework that integrates intra-ViT early compression into the slice-based encoding process.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The research introduces LLaVA-UHD v4, a visual encoding scheme that reduces computational costs by 55.8% while maintaining or surpassing baseline performance across various benchmarks. This work demonstrates that significant visual-encoding efficiency improvements are possible without performance compromise.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.08985" target="_blank">https://huggingface.co/papers/2605.08985</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233355761.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>61. Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: SLIM framework, agentic reinforcement learning, Skill Lifecycle Management, dynamic optimization variable, policy learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper introduces the SLIM framework to manage dynamic skill lifecycles in agentic reinforcement learning by jointly optimizing active skill sets alongside policy learning.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The SLIM framework uses leave-one-skill-out validation to estimate each active skill&#8217;s marginal external contribution and applies skill lifecycle operations: retaining high-value skills, retiring low-contribution skills, and expanding skill coverage when necessary.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The SLIM framework surpasses existing baselines with a 7.1% improvement across tasks in ALFWorld and SearchQA, demonstrating that policy learning and external skill retention can coexist, supporting SLIM as a general paradigm for skill-based agentic RL.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10923" target="_blank">https://huggingface.co/papers/2605.10923</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233331914.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>62. Key-Value Means</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Key-Value Means, transformer, attention, chunked RNN, sublinear state growth</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introducing Key-Value Means (KVM), a novel mechanism integrating transformer and RNN capabilities with control over computational complexity and memory usage.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implementing KVM in a transformer model to allow both fixed-size and growing states with little parameter increase, and demonstrating competitive performance on long-context tasks with efficient prefill and state growth.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Demonstrated the benefits of KVM which include expandable context memory, efficient chunk-wise parallelizable training, and memory savings, all with standard operations and hybrid solutions for long-context decoding.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.09877" target="_blank">https://huggingface.co/papers/2605.09877</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233303742.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>63. X-OmniClaw Technical Report: A Unified Mobile Agent for Multimodal Understanding and Interaction</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: X-OmniClaw, multimodal understanding, mobile agent, AI Native, Android environments</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper introduces X-OmniClaw, a unified mobile agent architecture designed to enable multimodal understanding and intelligent interaction within Android environments.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The architecture integrates layers such as Omni Perception for multimodal ingress, Omni Memory for optimized task continuity and context-awareness, and Omni Action combining XML metadata with visual perception. Techniques like Behavior Cloning and Trajectory Replay are used to capture and replay user skills.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; X-OmniClaw enhances interaction efficiency and task reliability, serving as a structural blueprint for future mobile-native personal assistants.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.05765" target="_blank">https://huggingface.co/papers/2605.05765</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233237039.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>64. Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Auto-Rubric as Reward (ARR), Rubric Policy Optimization (RPO), multimodal alignment, implicit preference knowledge, reward modeling</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to align multimodal generative models with human preferences through structured rubrics and improve policy gradients with binary rewards.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; An Auto-Rubric as Reward (ARR) framework is developed that externalizes implicit preference knowledge into structured rubrics, along with Rubric Policy Optimization (RPO) that stabilizes policy gradients.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The ARR and RPO framework demonstrates improved reliability and data efficiency in multimodal alignment on benchmarks in text-to-image generation and image editing compared to traditional models.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.08354" target="_blank">https://huggingface.co/papers/2605.08354</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233205783.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>65. Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: continual post-training, large language models, catastrophic forgetting, task geometry, geometry conflict</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study investigates how task geometry affects the continual post-training process in large language models, identifying geometry conflict as a significant factor in forgetting and a mechanism for managing update integration.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The research examines task geometry by analyzing each post-training task through its parameter updates and studies the covariance geometry produced by these updates. It proposes the Geometry-Conflict Wasserstein Merging (GCWM) method, which employs Gaussian Wasserstein barycenters for data-free update integration.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The study concludes that forgetting occurs due to misalignment of covariance geometries, which affects update integration. Sequential updates are successfully transferred when compatible with existing model states, but interference arises with high geometry conflict. The proposed GCWM method improves retention and performance compared to data-free baselines in the tested settings.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.09608" target="_blank">https://huggingface.co/papers/2605.09608</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233139271.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>66. SEIF: Self-Evolving Reinforcement Learning for Instruction Following</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Self-Evolving Reinforcement Learning, Instruction Following, Large Language Models, Model Capability Evolution, Instruction Difficulty Evolution</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To enhance the instruction-following capabilities of Large Language Models (LLMs) using a self-evolving reinforcement learning framework known as SEIF.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; SEIF employs a closed self-evolution loop involving iterative difficulty adaptation and co-training of Instructor and Follower components. It uses roles like Instructor, Filter, Follower, and Judger to improve model performance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; SEIF consistently enhances instruction-following performance across multiple model scales and architectures, showing strong generality. The key to improvement involves sufficient early-stage training to establish a foundation, followed by moderate late-stage training to prevent overfitting and optimize final performance. The framework&#8217;s effectiveness is supported by publicly available resources.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.07465" target="_blank">https://huggingface.co/papers/2605.07465</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233111415.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>67. TMAS: Scaling Test-Time Compute via Multi-Agent Synergy</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: TMAS, Large Language Models, Multi-Agent Synergy, Hierarchical Memories, Hybrid Reward Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective is to enhance the reasoning ability of large language models by implementing a multi-agent framework called TMAS, which organizes inference as a collaborative process among agents.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; TMAS employs structured collaboration through hierarchical memory systems to facilitate efficient cross-trajectory collaboration, supported by a hybrid reward reinforcement learning scheme.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; TMAS significantly improves iterative scaling beyond existing test-time scaling techniques, with hybrid reward training further increasing the effectiveness and stability of this process.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.10344" target="_blank">https://huggingface.co/papers/2605.10344</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233047736.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>68. Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Language models, Benchmark, Ill-posed problems, Frontier models, Mathematical knowledge</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Knowledge Representation and Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce &#8220;Soohak,&#8221; a new 439-problem mathematical benchmark to evaluate advanced reasoning in language models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Created by 64 mathematicians, Soohak includes a Challenge subset for evaluating problem-solving skills and a Refusal subset to test recognition of ill-posed problems.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Existing frontier models show significant room for improvement; none exceed 50% in recognizing ill-posed problems, highlighting a critical area for model enhancement.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.09063" target="_blank">https://huggingface.co/papers/2605.09063</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260512233024599.png"></figure>
</p>
</div>
<div style='height:30px'></div>
]]></content:encoded>
					
		
		<enclosure url="https://cdn.ainative.foundation/huggingface/20260512233424090.mp4" length="0" type="video/mp4" />

			</item>
		<item>
		<title>AI Native Daily Paper Digest &#8211; 20260511</title>
		<link>https://ainativefoundation.org/ai-native-daily-paper-digest-20260511/</link>
		
		<dc:creator><![CDATA[insights]]></dc:creator>
		<pubDate>Tue, 12 May 2026 00:41:29 +0000</pubDate>
				<category><![CDATA[Papers]]></category>
		<guid isPermaLink="false">https://ainativefoundation.org/ai-native-daily-paper-digest-20260511/</guid>

					<description><![CDATA[1. Mean Mode Screaming: Mean&#8211;Variance Split Residuals for 1000-Layer Diffusion Transformers 🔑 Keywords: Diffusion Transformers, Mean-Dominated Collapse, Mean Mode Screaming, Mean-Variance Split [&#8230;]]]></description>
										<content:encoded><![CDATA[<h3>1. Mean Mode Screaming: Mean&#8211;Variance Split Residuals for 1000-Layer Diffusion Transformers</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Diffusion Transformers, Mean-Dominated Collapse, Mean Mode Screaming, Mean-Variance Split Residuals</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Foundations of AI</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To address structural instability in deep Diffusion Transformers caused by Mean Mode Screaming, which leads to mean-dominated collapse at extreme depths.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implementing Mean-Variance Split (MV-Split) Residuals to stabilize training and prevent collapses by combining centered residual updates with leaky trunk-mean replacement.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; MV-Split Residuals effectively prevent divergent collapse in a 400-layer DiT, maintaining stable training beyond the baseline approach. The approach was further validated with a 1000-layer DiT, confirming trainability at extreme depths.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06169" target="_blank">https://huggingface.co/papers/2605.06169</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233010259.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>2. Flow-OPD: On-Policy Distillation for Flow Matching Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Flow-OPD, Flow Matching, On-Policy Distillation, Manifold Anchor Regularization, Stable Diffusion</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to address the limitations in Flow Matching text-to-image models by introducing a new framework, Flow-OPD, that improves generation quality and alignment metrics.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; A two-stage alignment approach is used, combining on-policy distillation and manifold anchor regularization. Initial domain-specialized teacher models are cultivated via single-reward GRPO fine-tuning, followed by a Flow-based Cold-Start scheme and a three-step orchestration process for consolidating expertise.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Flow-OPD achieves significant improvements in GenEval score and OCR accuracy compared to vanilla GRPO, enhancing image fidelity and human-preference alignment. The framework showcases a &#8216;teacher-surpassing&#8217; effect, demonstrating its potential as a scalable alignment paradigm for generalist text-to-image models.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.08063" target="_blank">https://huggingface.co/papers/2605.08063</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233039248.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>3. Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Group-based policy gradient, Listwise Policy Optimization, reinforcement learning, divergence minimization, response simplex</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To improve training performance and stability in reinforcement learning with verifiable rewards by developing Listwise Policy Optimization.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; This study employs a group-based policy gradient framework, leveraging a common geometric structure through target projection utilizing divergence minimization to optimize the policy.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Listwise Policy Optimization shows monotonic improvement on listwise objectives, enhances training performance over typical baselines, ensures optimization stability, and maintains response diversity across various reasoning tasks and LLM backbones.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06139" target="_blank">https://huggingface.co/papers/2605.06139</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233105614.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>4. Beyond Retrieval: A Multitask Benchmark and Model for Code Search</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: CoREB, code retrieval, reranking, embedding models, code-to-code retrieval</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce CoREB, a new benchmark addressing limitations in existing code search datasets, focusing on multitask evaluation and reranking capabilities.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Developed CoREB from rewritten LiveCodeBench problems in five languages with graded relevance judgments.</p>
<p>   &#8211; Benchmarked eleven embedding models and five rerankers across text-to-code, code-to-text, and code-to-code tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Code-specialized embeddings outperform general encoders for code-to-code retrieval but no model excels across all tasks.</p>
<p>   &#8211; Developer-style short keyword queries lead to poor performance across models.</p>
<p>   &#8211; Off-the-shelf rerankers show asymmetry across tasks; however, the fine-tuned CoREB-Reranker consistently improves performance across all tasks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.04615" target="_blank">https://huggingface.co/papers/2605.04615</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233203247.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>5. HumanNet: Scaling Human-centric Video Learning to One Million Hours</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: HumanNet, embodied intelligence, vision-language-action, egocentric video, human-centric</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce HumanNet, a large-scale human-centric video dataset designed to enhance training of vision-language-action models by replacing robot data with egocentric human video.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Development of HumanNet as a dataset incorporating various perspectives of human interactions, including fine-grained activities and diverse environments with interaction-centric annotations, enabling enhanced representation learning and activity understanding.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Controlled experiments demonstrate that HumanNet&#8217;s egocentric human video can effectively substitute for robot data, providing a scalable and cost-effective solution for embodied foundation models.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06747" target="_blank">https://huggingface.co/papers/2605.06747</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260511233131809.mp4"></video> </figure>
</p>
</div>
<div style='height:30px'></div>
<h3>6. UniPrefill: Universal Long-Context Prefill Acceleration via Block-wise Dynamic Sparsification</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: UniPrefill, vLLM, Long-Context Inference, Continuous Batching, Prefill-Decode Co-processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The main objective of this study is to develop UniPrefill, a prefill acceleration framework to enhance the efficiency and integration of long-context inference across diverse model architectures, especially within modern inference engines like vLLM.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; UniPrefill operates as a continuous batching operator while extending vLLM&#8217;s scheduling strategy to support prefill-decode co-processing and tensor parallelism, allowing for seamless integration and improved computational speed at the token level.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; UniPrefill achieves up to 2.1x speedup in Time-To-First-Token (TTFT), with increasingly pronounced performance improvements as the number of concurrent requests grows, highlighting its effectiveness in accelerating long-context inference.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06221" target="_blank">https://huggingface.co/papers/2605.06221</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233228651.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>7. AEM: Adaptive Entropy Modulation for Multi-Turn Agentic Reinforcement Learning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Reinforcement Learning, Credit Assignment, Entropy Dynamics, Exploration-Exploitation Trade-off, Response-Level Uncertainty</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research introduces a supervision-free method for credit assignment in RL to enhance exploration-exploitation balance and task performance in language model agents.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The method adaptively modulates entropy dynamics at the response level in RL training to align with the effective action granularity of LLM agents, using a practical response-level uncertainty proxy.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Experiments demonstrate that the proposed method improves on strong RL baselines, with noted gains when integrated into a state-of-the-art software-engineering RL framework.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.00425" target="_blank">https://huggingface.co/papers/2605.00425</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233255029.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>8. MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Sparse Attention, Mixture-of-Experts, Long Contexts, Query Heads, Computational Efficiency</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective: </p>
<p>   &#8211; The study aims to replace the dense token-wise indexing in sparse attention with MISA (Mixture of Indexer Sparse Attention) to reduce computational costs while maintaining performance for long contexts.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods: </p>
<p>   &#8211; MISA uses a routed mixture-of-experts approach to create a query-dependent pool of active heads, optimizing only those heads through a lightweight router that employs block-level statistics, thus minimizing the token-level scoring cost.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions: </p>
<p>   &#8211; MISA matches the performance of dense DSA indexers on LongBench and GLM-5 with significantly fewer active heads, improves computational efficiency up to 3.82 times on specific hardware, and retains more than 92% of token selections by DSA per layer.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.07363" target="_blank">https://huggingface.co/papers/2605.07363</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233352632.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>9. MatryoshkaLoRA: Learning Accurate Hierarchical Low-Rank Representations for LLM Fine-Tuning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: MatryoshkaLoRA, Low-Rank Adaptation, Dynamic Rank Selection, Hierarchical Low-Rank Representations, AURAC  </p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Machine Learning  </p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:  </p>
<p>   &#8211; The research aims to introduce MatryoshkaLoRA, a hierarchical low-rank adaptation framework that dynamically adjusts rank selection to improve accuracy-performance trade-offs over existing methods.  </p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:  </p>
<p>   &#8211; The framework involves inserting a fixed diagonal matrix between existing adapters to scale sub-ranks appropriately and ensures efficient embedding of gradient information across all hierarchical ranks. The research proposes a new evaluation metric, Area Under the Rank Accuracy Curve (AURAC), for assessing performance.  </p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:  </p>
<p>   &#8211; MatryoshkaLoRA achieves more accurate hierarchical low-rank representations compared to prior rank-adaptive approaches, providing superior accuracy-performance trade-offs across evaluated datasets.  </p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.07850" target="_blank">https://huggingface.co/papers/2605.07850</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233322521.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>10. What if AI systems weren&#8217;t chatbots?</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Conversational Chatbots, Sociotechnical Configuration, AI Ethics, Labor Displacement, Environmental Costs</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Human-AI Interaction</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper investigates the sociotechnical implications of AI&#8217;s focus on conversational chatbot interfaces and their widespread adoption.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; An analytical approach examining the effects of normalizing chatbot interactions on social, economic, and environmental aspects.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Chatbot-based systems often fail to meet complex user needs, influence patterns of work and learning, contribute to deskilling, and have broader societal impacts such as labor displacement and increased economic concentration. The paper advocates for AI development that embraces pluralistic system design and task-specific tools, highlighting the necessity for accountability and sustainability.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.07896" target="_blank">https://huggingface.co/papers/2605.07896</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233511513.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>11. STARFlow2: Bridging Language Models and Normalizing Flows for Unified Multimodal Generation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Autoregressive Normalizing Flows, Transformer, Multimodal Generation, Causal Mask, KV-cache</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to enable unified multimodal generation systems that can seamlessly handle interleaved text and image sequences without structural mismatches.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study utilizes autoregressive normalizing flows based on Transformer architecture, specifically through the introduction of STARFlow2 built on the Pretzel architecture with shared causal masking and KV-cache mechanisms.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The findings demonstrate that autoregressive normalizing flows offer strong performance in both image generation and multimodal understanding, validating their potential as a foundational model for unified multimodal modeling.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.08029" target="_blank">https://huggingface.co/papers/2605.08029</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233445665.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>12. UniSD: Towards a Unified Self-Distillation Framework for Large Language Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Self-distillation, Autoregressive Language Models, UniSD, Training Stability, Efficient Adaptation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To systematically study and enhance the adaptation of autoregressive language models using the self-distillation framework UniSD.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Integration of mechanisms including multi-teacher agreement, EMA teacher stabilization, token-level contrastive learning, feature matching, and divergence clipping within the UniSD framework.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; UniSD effectively identifies factors that improve self-distillation over static imitation and highlights how components interact across tasks. The integrated pipeline, UniSDfull, achieves significant performance improvements, demonstrating self-distillation as a practical approach for efficient LLM adaptation without external teachers.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06597" target="_blank">https://huggingface.co/papers/2605.06597</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233421235.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>13. What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: latent manifold, diffusion models, Prior-Aligned AutoEncoder, latent space, generative modeling</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to explore latent manifold properties in diffusion models and proposes a Prior-Aligned AutoEncoder (PAE) to optimize latent space for enhanced generative modeling.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study involves constructing controlled tokenizer variants to identify key properties of a diffusion-friendly latent manifold, including coherent spatial structure, local continuity, and global semantics.</p>
<p>   &#8211; It introduces PAE, which uses refined priors and perturbation-based regularization to explicitly shape the latent manifold.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The findings indicate that organizing the latent manifold enhances generative modeling quality, with PAE achieving improvements in training efficiency and generation quality, establishing a new state-of-the-art performance on ImageNet 256&#215;256.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.07915" target="_blank">https://huggingface.co/papers/2605.07915</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233537588.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>14. Fast Byte Latent Transformer</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Byte Latent Transformer, BLT Diffusion, Speculative Decoding, Inference Procedure, Byte-Level Language Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective of the research is to address the slow byte-by-byte autoregressive generation in byte-level language models by introducing new training and generation techniques.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduction of BLT Diffusion (BLT-D), which uses block-wise diffusion objectives for faster parallel processing.</p>
<p>   &#8211; Development of two extensions: BLT Self-speculation and BLT Diffusion+Verification, to enhance generation quality while balancing speed.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The new methods significantly reduce memory-bandwidth cost and logistical barriers, improving the practical use of byte-level language models in generative tasks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.08044" target="_blank">https://huggingface.co/papers/2605.08044</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233650547.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>15. ModelLens: Finding the Best for Your Task from Myriads of Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: ModelLens, leaderboard interactions, unified framework, model recommendation, performance-aware latent space</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop ModelLens, a unified framework for recommending models in real-world scenarios by leveraging public leaderboard data.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilized a performance-aware latent space to rank unseen models on unseen datasets without direct evaluation on the target dataset.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Demonstrated that ModelLens surpasses existing baselines, improving routing methods by up to 81% across various QA benchmarks, and generalizing well to both text and vision-language tasks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.07075" target="_blank">https://huggingface.co/papers/2605.07075</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233625299.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>16. IntentGrasp: A Comprehensive Benchmark for Intent Understanding</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: IntentGrasp, intent understanding, Large Language Model, benchmark, Intentional Fine-Tuning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to introduce IntentGrasp as a benchmark to evaluate the intent understanding capability of large language models (LLMs).</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; IntentGrasp is derived from 49 corpora spanning 12 domains and involves curation, contextualization, and task unification to form a training set and two evaluation sets.</p>
<p>   &#8211; Extensive evaluations were conducted on 20 LLMs using IntentGrasp, followed by Intentional Fine-Tuning for improving the models&#8217; performance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The LLMs demonstrated poor performance on IntentGrasp, with most models scoring below expectation.</p>
<p>   &#8211; Intentional Fine-Tuning significantly improved the performance across evaluation sets and showed strong cross-domain generalizability.</p>
<p>   &#8211; The study presents Intentional Fine-Tuning as a promising approach to enhance intent understanding, aiming for more capable and safe AI assistants.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06832" target="_blank">https://huggingface.co/papers/2605.06832</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233601548.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>17. SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Speculative decoding, SpecBlock, AI-generated summary, Autoregressive drafters, Path dependence</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study introduces SpecBlock, a block-iterative drafter that enhances LLM inference speed and maintains accuracy by combining path dependence with economical drafting techniques.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes a block-based approach where each drafter forward provides multiple dependent positions, enhancing the draft tree with iterative block expansion while implementing mechanisms for path dependence.</p>
<p>   &#8211; Incorporates a co-trained rank head and valid-prefix mask for more efficient and accurate drafting.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; SpecBlock achieves a mean speedup improvement of 8-13% over EAGLE-3 at significantly lower drafting costs, and further cost-aware adaptations enhance performance gain to 11-19%.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.07243" target="_blank">https://huggingface.co/papers/2605.07243</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233747199.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>18. LiVeAction: a Lightweight, Versatile, and Asymmetric Neural Codec Design for Real-time Operation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Lightweight neural codec, Rate-distortion performance, FFT-like structure, Variance-based rate penalty, Resource-constrained devices</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop a lightweight neural codec architecture that improves rate-distortion performance for resource-constrained devices by using an FFT-like structure and variance-based rate penalty.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implemented an FFT-like structure to reduce the complexity of the encoder and replace adversarial and perceptual losses with a variance-based rate penalty to accommodate arbitrary signal modalities.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed architecture, LiVeAction, achieves superior rate-distortion performance compared to existing generative tokenizers and remains practical for deployment on low-power sensors.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06628" target="_blank">https://huggingface.co/papers/2605.06628</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233716316.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>19. Empirical Evidence for Simply Connected Decision Regions in Image Classifiers</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Decision regions, Path connected, Simply connected, Quad-mesh filling procedure, Coons patches</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Foundations of AI</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To investigate whether closed loops inside decision regions in deep neural networks can be contracted without leaving the region, thus exploring their simple connectivity.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; An iterative quad-mesh filling procedure is used to construct a label-preserving surface within the decision region, connecting it to Coons patches for geometric interpolation analysis.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The study provides empirical evidence supporting that decision regions in deep neural networks are not only path connected but also simply connected.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06380" target="_blank">https://huggingface.co/papers/2605.06380</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233811375.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>20. Who Prices Cognitive Labor in the Age of Agents? Compute-Anchored Wages</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AI agents, Cognitive labor, Compute capital, Wage-setting, Factor-pricing framework</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Foundations of AI</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To challenge the conventional view of AI agents as labor substitutes by proposing a new economic framing where AI agents are a production technology that converts compute capital into cognitive labor.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Application of a classic factor-pricing framework to derive a Compute-Anchored Wage (CAW) bound, and use of constant elasticity of substitution (CES) aggregation to differentiate between substitutable and complementary tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The equilibrium wage for cognitive labor is increasingly governed by the compute capital market rather than traditional labor markets, shifting the price-setting mechanism due to AI agents being conceptualized as a production technology.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.05558" target="_blank">https://huggingface.co/papers/2605.05558</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233836263.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>21. CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Deployment-time learning, Large language models, Episodic memory, Contextual bandit, Continual Adaptation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To introduce deployment-time learning (DTL) as an additional stage in the lifecycle of large language models (LLMs), enabling them to adapt and improve during deployment without modifying model parameters.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Development of the CASCADE framework, which leverages episodic memory and formulates experience reuse as a contextual bandit problem, allowing for effective exploration-exploitation trade-offs and providing long-term no-regret guarantees.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; CASCADE framework demonstrated a 20.9% improvement in macro-averaged success rate across 16 diverse tasks compared to zero-shot prompting and outperformed other gradient-based and memory-based approaches, establishing a foundation for continually improving AI systems.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06702" target="_blank">https://huggingface.co/papers/2605.06702</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233944277.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>22. PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: PrefixGuard, trace analysis, prefix-based risk scoring, StepView induction, event abstraction</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce PrefixGuard, a framework for online monitoring of large language model (LLM) agents through trace analysis and prefix-based risk scoring.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Develop a trace-to-monitor framework with an offline StepView induction step followed by supervised monitor training to induce deterministic typed-step adapters from raw trace samples.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; PrefixGuard demonstrates strong performance across multiple benchmark tasks, achieving notable improvements over raw-text controls. Despite LLM judges being weaker under the same protocol, PrefixGuard offers actionable early alerts with explicit diagnostics to support effective interventions.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06455" target="_blank">https://huggingface.co/papers/2605.06455</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233917588.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>23. R^3-SQL: Ranking Reward and Resampling for Text-to-SQL</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: R^3-SQL, Text-to-SQL, unified reward, agentic resampling, execution accuracy</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective is to address inconsistencies in scoring functionally equivalent SQL queries and to improve candidate recall in Text-to-SQL systems.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduces R^3-SQL, a framework that employs unified reward ranking by grouping candidates by execution result and combining pairwise preference with pointwise utility.</p>
<p>   &#8211; Utilizes agentic resampling to enhance candidate pool by selectively resampling when correct SQL is likely absent.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; R^3-SQL achieves 75.03 execution accuracy on BIRD-dev, establishing a new state of the art, with consistent improvements across five benchmarks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.25325" target="_blank">https://huggingface.co/papers/2604.25325</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511234137611.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>24. Shallow Prefill, Deep Decoding: Efficient Long-Context Inference via Layer-Asymmetric KV Visibility</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: SPEED, KV-visibility policy, long-context inference, Llama-3.1-8B, prompt tokens</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To reduce long-context inference costs in decoder-only language models by implementing a phase-asymmetric KV-visibility policy called SPEED.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduced Shallow Prefill, dEEp Decode (SPEED) that materializes non-anchor prompt-token KV states only in lower layers.</p>
<p>   &#8211; Conducted a controlled Llama-3.1-8B instruction-tuning study comparing the effectiveness of using only 75% of layers for prefill tokens.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; SPEED maintains benchmark quality while reducing long-context costs and improving efficiency metrics like TTFT and TPOT.</p>
<p>   &#8211; Demonstrates that long-context prompt tokens do not need to persist as full-depth KV-cache objects if Decode-phase tokens remain full-depth.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06105" target="_blank">https://huggingface.co/papers/2605.06105</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511234109262.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>25. Sparse Autoencoders as Plug-and-Play Firewalls for Adversarial Attack Detection in VLMs</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: SAEgis, adversarial attacks, Vision-Language Models, sparse autoencoders, cross-domain generalization</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce SAEgis, a framework for detecting adversarial attacks on Vision-Language Models using sparse autoencoders.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Insert sparse autoencoder modules into pretrained Vision-Language Models, trained with reconstruction objectives to capture attack-relevant signals naturally.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; SAEgis achieves robust performance across in-domain, cross-domain, and cross-attack settings without requiring additional training, enhancing the safety of real-world VLM systems.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.07447" target="_blank">https://huggingface.co/papers/2605.07447</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511234041030.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>26. BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Image captioning, multimodal large language models, reinforcement learning, continuous multi-objective reward formulation, GDPO-style</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The primary goal is to develop a balanced reinforcement learning framework for image captioning that optimizes correctness, coverage, and linguistic quality, surpassing the performance of existing methods.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study employs a continuous multi-objective reward formulation with GDPO-style reward-decoupled normalization and length-conditional reward masking to enhance captioning performance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed method consistently improves caption quality with significant gains in various models, evidencing enhancements of +13.6 in DCScore, +9.0 in CaptionQA, and +29.0 in CapArena.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.07394" target="_blank">https://huggingface.co/papers/2605.07394</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511234009946.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>27. Delta-Adapter: Scalable Exemplar-Based Image Editing with Single-Pair Supervision</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Delta-Adapter, single-pair supervision, semantic delta, pre-trained vision encoder, Perceiver-based adapter</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To improve image editing accuracy and generalization through a method called Delta-Adapter, which enables editing with single-pair supervision by utilizing semantic deltas extracted from pre-trained vision encoders.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Employed Delta-Adapter by using a pre-trained vision encoder to extract semantic deltas and integrating them into an image editing model via a Perceiver-based adapter. Introduced semantic delta consistency loss to enhance transformation fidelity.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Delta-Adapter demonstrates consistent improvement in editing accuracy and content consistency, effectively generalizing to both seen and unseen editing tasks over existing baselines.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.07940" target="_blank">https://huggingface.co/papers/2605.07940</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511234204755.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>28. </h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="" target="_blank"></a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/thirteen/202605111778542933.jpg"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>29. Trajectory as the Teacher: Few-Step Discrete Flow Matching via Energy-Navigated Distillation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Discrete Flow Matching, Trajectory-Shaped Guidance, Distillation, Perplexity, Language Modeling</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Improve text generation efficiency by replacing stochastic jumps with trajectory-shaped guidance to achieve better performance with reduced computational requirements.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilize Trajectory-Shaped Discrete Flow Matching (TS-DFM) which employs a lightweight energy compass for guided navigation, evaluating candidate continuations at each midpoint during training.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; TS-DFM achieves superior perplexity compared to baseline discrete-generation methods, is notably faster, and effective across different source distributions and evaluators.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.07924" target="_blank">https://huggingface.co/papers/2605.07924</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511234150884.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>30. CPCANet: Deep Unfolding Common Principal Component Analysis for Domain Generalization</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Domain Generalization, Common Principal Component Analysis, zero-shot transfer, domain-invariant subspace, invariant learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to create a structured domain-invariant subspace using Common Principal Component Analysis to enhance domain generalization under out-of-distribution conditions.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; CPCANet employs the iterative Flury-Gautschi algorithm within differentiable neural layers, integrating statistical properties of CPCA into a trainable framework to identify shared subspaces across domains.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; CPCANet achieves state-of-the-art performance in zero-shot transfer without the need for architecture-specific tuning, making it a simple and efficient solution for learning robust representations amid distribution shifts.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.05136" target="_blank">https://huggingface.co/papers/2605.05136</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511234122891.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>31. Rubric-based On-policy Distillation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: On-policy distillation, teacher logits, structured semantic rubrics, sample efficiency, rubric-based OPD</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Foundations of AI</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To demonstrate the effectiveness of rubric-based OPD as a scalable and black-box-compatible alternative to traditional logit-based methods for model alignment.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduction of ROPD, a framework using structured semantic rubrics from teacher-student contrasts to score student rollouts for on-policy optimization.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; ROPD achieves up to a 10x gain in sample efficiency over advanced logit-based OPD methods, positioning it as a flexible alternative suitable for both proprietary and open-source large language models.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.07396" target="_blank">https://huggingface.co/papers/2605.07396</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511234056206.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>32. Learning Visual Feature-Based World Models via Residual Latent Action</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Visual world models, Residual Latent Action, RLA World Model, Robot Learning, Video Diffusion</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To introduce a novel world model, RLA-WM, based on Residual Latent Action representations, for predicting future visual features efficiently and for enhancing robot learning techniques. </p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilized the RLA to predict transitions using flow matching, and developed robot learning methods that leverage the new RLA World Model for policy learning from offline videos.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; RLA-WM outperforms current state-of-the-art methods in both simulation and real-world settings with improved efficiency, and it supports novel robot learning approaches that require no online interaction or handcrafted rewards.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.07079" target="_blank">https://huggingface.co/papers/2605.07079</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260511234022743.mp4"></video> </figure>
</p>
</div>
<div style='height:30px'></div>
<h3>33. CGM-JEPA: Learning Consistent Continuous Glucose Monitor Representations via Predictive Self-Supervised Pretraining</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Continuous Glucose Monitoring, Self-supervised Pretraining, Cross-Modal, Cohort Generalization, AI in Healthcare</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI in Healthcare</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop a self-supervised pretraining framework (CGM-JEPA) that abstracts continuous glucose monitoring data for better cross-modal and cross-cohort performance by predicting masked latent representations.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Pretraining on unlabeled CGM data from 228 subjects.</p>
<p>   &#8211; Evaluation through cross-validation on clinical cohorts in various regimes like cohort generalization and venous-to-CGM transfer.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; X-CGM-JEPA achieves top performance on AUROC across all regimes, significantly exceeding baseline models.</p>
<p>   &#8211; Introduces a novel distributional objective that enhances performance under modality shift and improves label-aware clustering on sparse venous data.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.00933" target="_blank">https://huggingface.co/papers/2605.00933</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233957482.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>34. Gated QKAN-FWP: Scalable Quantum-inspired Sequence Learning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Quantum-inspired, Fast Weight Programmers, NISQ device compatibility, Scalar-gated update, Solar cycle forecasting</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Quantum Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop a quantum-inspired fast-weight programming framework using single-qubit circuits that achieves superior forecasting performance with fewer parameters compared to classical recurrent models, while ensuring compatibility with NISQ devices.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Integration of Fast Weight Programmers (FWPs) with Quantum-inspired Kolmogorov-Arnold Network (QKAN) using single-qubit data re-uploading circuits as nonlinear activation.</p>
<p>   &#8211; Introduction of a scalar-gated fast-weight update rule with theoretical analysis on adaptive memory kernel, geometric boundedness, and parallelizable gradient paths.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed framework achieves lower scaled Mean Square Error (MSE), peak amplitude error, and peak timing error in long-horizon solar cycle forecasting compared to traditional recurrent models, including LSTM networks, with significantly fewer parameters.</p>
<p>   &#8211; The framework is validated on NISQ devices, maintaining high forecasting accuracy, highlighting its scalability, parameter efficiency, and practical applicability in real-world scenarios.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06734" target="_blank">https://huggingface.co/papers/2605.06734</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233930607.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>35. Rethinking RL for LLM Reasoning: It&#8217;s Sparse Policy Selection, Not Capability Learning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Reinforcement Learning, ReasonMaxxer, Large Language Models, Token-Level Analysis, Entropy</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning </p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to investigate whether the reinforcement learning (RL) optimization loop is necessary for correcting uncertainties at specific decision points in large language models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The authors conducted a token-level analysis across various model families and RL algorithms to assess the impact of RL on decision-making and utilized entropy-gated decision point corrections through the proposed ReasonMaxxer method.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; It was concluded that RL primarily corrects uncertainty at key decision points rather than acquiring new capabilities, and ReasonMaxxer can achieve similar or better performance compared to full RL with significantly reduced training costs.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06241" target="_blank">https://huggingface.co/papers/2605.06241</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233849214.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>36. Discovering Reinforcement Learning Interfaces with Large Language Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Automated reinforcement learning interface discovery, LLM-guided evolutionary algorithms, observation mappings, reward functions, co-design</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to automate the discovery of reinforcement learning task interfaces, particularly focusing on generating both observation mappings and reward functions from raw simulator states using LLM-guided evolutionary algorithms.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The research introduces LIMEN, an evolutionary framework guided by large language models (LLMs), which iteratively refines candidate interfaces using policy training feedback across various control domains, including novel discrete gridworld tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The research demonstrates that the joint evolution of observations and rewards can effectively discover RL interfaces, significantly reducing the need for manual engineering. It highlights that optimizing these components together is crucial, as single-component optimization leads to failures in certain domains.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.03408" target="_blank">https://huggingface.co/papers/2605.03408</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233825078.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>37. Steering Visual Generation in Unified Multimodal Models with Understanding Supervision</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Unified multimodal models, Understanding-Oriented Post-Training, generative representations, semantic abstraction, visual regression</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to enhance generative models by integrating comprehension tasks as supervisory signals for better image generation and editing.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; A lightweight framework named Understanding-Oriented Post-Training (UNO) is introduced, treating understanding as both a distinct task and a supervisory signal for generative representations.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Extensive experiments demonstrate that incorporating understanding as a catalyst can significantly improve the performance of image generation and editing.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.05781" target="_blank">https://huggingface.co/papers/2605.05781</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233759514.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>38. MDN: Parallelizing Stepwise Momentum for Delta Linear Attention</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Linear Attention, Momentum DeltaNet, Large Language Models, Stochastic Gradient Descent, Momentum-Based Optimizers</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Address challenges in training efficiency and convergence in Linear Attention models using a momentum-based approach.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Develop a chunkwise parallel algorithm with a stepwise momentum rule.</p>
<p>   &#8211; Analyze momentum-based recurrence through a dynamical systems perspective with complex conjugate eigenvalues.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Momentum DeltaNet (MDN) achieves comparable training throughput to existing models and demonstrates consistent performance improvements across benchmarks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.05838" target="_blank">https://huggingface.co/papers/2605.05838</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233732130.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>39. From Storage to Experience: A Survey on the Evolution of LLM Agent Memory Mechanisms</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Large Language Model, memory mechanisms, continual learning, proactive exploration, cross-trajectory abstraction</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to bridge the gaps between operating systems engineering and cognitive science through a unified evolutionary framework for LLM agent memory mechanisms, structured into three phases: Storage, Reflection, and Experience.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The paper proposes a formal definition and analysis of the stages of memory mechanisms within LLM-based agents. It focuses on the need for long-range consistency, challenges in dynamic environments, and the goal of continual learning.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; By synthesizing fragmented insights in existing literature, the paper offers design principles and a roadmap for the future development of next-generation LLM agents, highlighting transformative mechanisms like proactive exploration and cross-trajectory abstraction.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06716" target="_blank">https://huggingface.co/papers/2605.06716</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233702083.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>40. InterLV-Search: Benchmarking Interleaved Multimodal Agentic Search</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Interleaved Language-Vision Agentic Search, Multimodal Search, Visual Evidence Seeking, Multimodal Evidence Integration</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research introduces the InterLV-Search benchmark to assess interleaved Language-Vision agentic search, emphasizing the repeated use of both textual and visual evidence.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study leverages 2,061 examples across active visual evidence seeking, controlled offline interleaved searching, and open-web searching, constructed through automated and machine-led, human-supervised pipelines.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Current systems struggle significantly with interleaved multimodal search, evidenced by a best model accuracy below 50%, revealing ongoing challenges in visual evidence seeking, search control, and multimodal evidence integration.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.07510" target="_blank">https://huggingface.co/papers/2605.07510</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233637886.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>41. SkCC: Portable and Secure Skill Compilation for Cross-Framework LLM Agents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: SkCC, AI-generated summary, LLM-Agents, SKILL.md, agent frameworks</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study introduces SkCC, a compilation framework designed to enable portable deployment of agent skills across various platforms with improved security and performance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; SkCC uses a strongly-typed intermediate representation (SkIR) to decouple skill semantics from platform-specific formatting and includes a compile-time Analyzer to enforce security constraints via Anti-Skill Injection.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Experiments demonstrate that SkCC consistently improves performance and security, increasing pass rates on multiple platforms and achieving substantial runtime token savings.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.03353" target="_blank">https://huggingface.co/papers/2605.03353</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233613510.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>42. Scaling Continual Learning to 300+ Tasks with Bi-Level Routing Mixture-of-Experts</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Continual learning, class-incremental learning, pre-trained model, bi-level routing, OmniBenchmark-1K</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To address the challenge of class-incremental learning (CIL) with a novel continual learning framework called CaRE, capable of handling very long task sequences efficiently.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduction of a Bi-Level Routing Mixture-of-Experts (BR-MoE) mechanism, which includes a router selection stage and an expert routing phase to optimize task-specific and expert-based learning processes.</p>
<p>   &#8211; Presentation of a new dataset, OmniBenchmark-1K, for evaluating CIL performance on extensive task sequences involving hundreds of tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; CaRE exhibits superior performance across various datasets and task configurations, particularly in extensive task sequences ranging from 100 to over 300 non-overlapping tasks, surpassing current baselines by a notable margin.</p>
<p>   &#8211; The work encourages further exploration into continual learning over extremely prolonged task sequences, offering publicly accessible code and dataset.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2602.03473" target="_blank">https://huggingface.co/papers/2602.03473</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233549538.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>43. SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: SCOPE, Semantic Commitments, Conceptual Rift, Specification-Guided Skill Orchestration, Entity-Gated Intent Pass Rate</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective of the study is to enhance complex visual intent fulfillment in text-to-image generation by maintaining semantic commitments throughout the process.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study introduces SCOPE, a specification-guided framework that organizes retrieval, reasoning, and repair skills to uphold semantic commitments, and evaluates it using a human-annotated benchmark, Gen-Arena, equipped with Entity-Gated Intent Pass Rate (EGIP).</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; SCOPE significantly improves performance over baselines on Gen-Arena, achieving 0.60 EGIP, and shows strong results on other benchmarks like WISE-V and MindBench, highlighting its effectiveness in persistent commitment tracking for complex image generation.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.08043" target="_blank">https://huggingface.co/papers/2605.08043</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233522149.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>44. Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Q-RAG, Multi-step retrieval, Reinforcement learning, Embedder model, Long-context benchmarks</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to enhance large language models&#8217; retrieval capabilities for complex queries by introducing Q-RAG, a multi-step retrieval approach fine-tuned using reinforcement learning.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study fine-tunes the Embedder model in a resource-efficient manner using reinforcement learning for multi-step retrieval, offering a new alternative to traditional resource-intensive methods.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Q-RAG achieves state-of-the-art results on long-context benchmarks BabiLong and RULER, demonstrating its effectiveness and efficiency for open-domain question answering tasks with contexts up to 10 million tokens.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2511.07328" target="_blank">https://huggingface.co/papers/2511.07328</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233459494.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>45. Normalizing Trajectory Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Normalizing Trajectory Models, diffusion-based generation, likelihood training, self-distillation, text-to-image benchmarks</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce Normalizing Trajectory Models (NTM) to improve diffusion-based generation with exact likelihood training.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Model each reverse step as a conditional normalizing flow, integrating shallow invertible blocks and a deep parallel predictor for high-quality sample generation in few steps.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; NTM achieves superior performance on text-to-image benchmarks, matching or surpassing existing baselines in just four sampling steps while maintaining exact likelihood over the generative trajectory.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.08078" target="_blank">https://huggingface.co/papers/2605.08078</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233434819.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>46. A^2RD: Agentic Autoregressive Diffusion for Long Video Consistency</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: A^2RD, Agentic Auto-Regressive Diffusion, long video synthesis, semantic drift, narrative collapse</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To address the challenges of synthesizing consistent and coherent long videos by introducing A^2RD, a novel architecture that separates creative synthesis from consistency enforcement.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implementation of a closed-loop process with a Retrieve&#8211;Synthesize&#8211;Refine&#8211;Update cycle, utilizing components like Multimodal Video Memory, Adaptive Segment Generation, and Hierarchical Test-Time Self-Improvement.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; A^2RD significantly enhances consistency and narrative coherence compared to existing baselines, achieving improvements of up to 30% in consistency and 20% in narrative coherence on public benchmarks and the novel LVBench-C. Human evaluations confirm improvements in motion and transition smoothness.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06924" target="_blank">https://huggingface.co/papers/2605.06924</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233408671.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>47. 4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: 4D reasoning, vision-language models, dynamic spatial reasoning, data generation pipeline, Dynamic-Imagery Fine-Tuning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce 4DThinker, a framework enabling vision-language models to perform dynamic spatial reasoning through 4D latent mental imagery.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilize a scalable, annotation-free data generation pipeline to create 4D reasoning data from raw videos.</p>
<p>   &#8211; Implement Dynamic-Imagery Fine-Tuning (DIFT) for grounding models in dynamic visual semantics.</p>
<p>   &#8211; Apply 4D Reinforcement Learning (4DRL) with outcome-based rewards for complex reasoning tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; 4DThinker outperforms existing baselines in dynamic spatial reasoning and offers a new approach to 4D reasoning in VLMs.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.05997" target="_blank">https://huggingface.co/papers/2605.05997</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233335681.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>48. Rethinking State Tracking in Recurrent Models Through Error Control Dynamics</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Affine recurrent networks, State tracking, Error control, State-space models, Linear Attention</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Foundations of AI</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to examine the limitations of affine recurrent networks in achieving robust state tracking and highlight the significance of error control alongside expressive capacity.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Theoretical analysis and empirical demonstration on group state-tracking tasks are used to understand the failure mechanics of affine recurrent networks in maintaining accurate state tracking.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Affine recurrent networks fail to correct errors in state-separating subspaces and resort to finite horizon solutions driven by accumulated errors. The study demonstrates that robust state tracking relies not only on the architecture&#8217;s expressivity but critically on error control and predicts that tracking collapses when the distinguishability ratio surpasses the readability threshold.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.07755" target="_blank">https://huggingface.co/papers/2605.07755</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233310391.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>49. DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AI agents, security, red-teaming, simulation environments, attack strategies</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To evaluate and enhance AI agent security through a controllable red-teaming platform, DecodingTrust-Agent Platform (DTap).</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Use of DTap to simulate 14 real-world domains and over 50 simulation environments.</p>
<p>   &#8211; Introduction of DTap-Red, an autonomous red-teaming agent, to systematically explore injection vectors and discover effective attack strategies.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Large-scale evaluations using DTap-Bench dataset reveal systematic vulnerability patterns in AI agents.</p>
<p>   &#8211; Insights are provided for developing more secure next-generation AI agents.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.04808" target="_blank">https://huggingface.co/papers/2605.04808</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233239474.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>50. TextLDM: Language Modeling with Continuous Latent Diffusion</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: TextLDM, Representation Alignment, Diffusion Transformers, GPT-2, OpenWebText2</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to adapt visual latent diffusion transformers for language modeling by mapping discrete tokens to continuous latents and utilizing representation alignment for enhanced text generation quality.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; TextLDM employs a Transformer-based VAE to map tokens to latents, enhanced by Representation Alignment with a frozen pretrained language model. Standard Diffusion Transformers perform flow matching in the latent space.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; TextLDM, trained on OpenWebText2, significantly outperforms prior diffusion language models, matching GPT-2 under similar conditions. This demonstrates the effectiveness of transferring the visual DiT recipe to language for multimodal generation and understanding.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.07748" target="_blank">https://huggingface.co/papers/2605.07748</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233215650.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>51. Anisotropic Modality Align</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Modality Gap, Multimodal Model, Anisotropic Geometric Correction, Semantic Geometry, Modality Alignment</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to address the Modality Gap in multimodal models through an Anisotropic Geometric Correction framework for effective unpaired modality alignment.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study revisits the geometric nature of the modality gap and proposes the AnisoAlign framework, which focuses on anisotropic modality gap alignment by aligning with the target-modality distribution while preserving the source modality&#8217;s semantic structure.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Experiments demonstrate that the proposed AnisoAlign framework benefits both geometric diagnostics and text-only multimodal large language model training, transforming the modality gap from an empirical observation into a correctable structured geometric phenomenon.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.07825" target="_blank">https://huggingface.co/papers/2605.07825</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233148798.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>52. LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AutoTTS, Test-time scaling, controller synthesis, reasoning trajectories, probe signals</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to automate the discovery of test-time scaling (TTS) strategies by developing AutoTTS, an environment-driven framework that replaces hand-crafted heuristics with automatic strategy discovery.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; AutoTTS operates by formulating TTS as controller synthesis over reasoning trajectories and probe signals. It introduces beta parameterization and fine-grained execution trace feedback to optimize strategy discovery.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Experiment results demonstrate that AutoTTS-discovered TTS strategies outperform manually designed baselines, enhancing the accuracy-cost tradeoff efficiently across various benchmarks, and generalizing well across different model scales at a low discovery cost.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.08083" target="_blank">https://huggingface.co/papers/2605.08083</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233118915.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>53. HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: HyperEyes, Parallel Multimodal Search, Reinforcement Learning, Inference Efficiency, Dual-Grained Efficiency</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop a parallel multimodal search agent, HyperEyes, that performs concurrent entity searches while enhancing inference efficiency.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; HyperEyes is trained using a Parallel-Amenable Data Synthesis Pipeline and a Dual-Grained Efficiency-Aware Reinforcement Learning framework. Additionally, specific benchmarks (IMEB) are introduced to evaluate both accuracy and efficiency.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; HyperEyes-30B achieves significant improvements, surpassing the strongest comparable open-source agent by 9.9% in accuracy with a 5.3x reduction in tool-call rounds on average.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.07177" target="_blank">https://huggingface.co/papers/2605.07177</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233051704.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>54. MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: MACE-Dance, AI-generated content (AIGC), music-driven dance generation, Mixture-of-Experts, diffusion models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The primary goal is to create a high-quality music-driven dance video generation framework, MACE-Dance, that excels in producing realistic human motion and visual appearance by integrating cascaded Mixture-of-Experts and diffusion models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilization of a hybrid architecture consisting of Mixture-of-Experts and a diffusion model within the Motion Expert to convert music into 3D motion, ensuring kinematic plausibility and artistic expressiveness.</p>
<p>   &#8211; Adoption of a decoupled kinematic-aesthetic fine-tuning strategy for the Appearance Expert to maintain visual identity and achieve state-of-the-art performance in pose-driven image animation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; MACE-Dance significantly enhances the quality of music-driven dance video generation and sets a new state-of-the-art benchmark by effectively capturing both motion and appearance aspects, supported by a curated large-scale dataset and an innovative evaluation protocol.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2512.18181" target="_blank">https://huggingface.co/papers/2512.18181</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260511233026094.png"></figure>
</p>
</div>
<div style='height:30px'></div>
]]></content:encoded>
					
		
		<enclosure url="https://cdn.ainative.foundation/huggingface/20260511233131809.mp4" length="0" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/huggingface/20260511234022743.mp4" length="0" type="video/mp4" />

			</item>
		<item>
		<title>AI Native Daily Paper Digest &#8211; 20260508</title>
		<link>https://ainativefoundation.org/ai-native-daily-paper-digest-20260508/</link>
		
		<dc:creator><![CDATA[insights]]></dc:creator>
		<pubDate>Sat, 09 May 2026 00:41:08 +0000</pubDate>
				<category><![CDATA[Papers]]></category>
		<guid isPermaLink="false">https://ainativefoundation.org/ai-native-daily-paper-digest-20260508/</guid>

					<description><![CDATA[1. Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning 🔑 Keywords: Skill1, skill selection, skill library, task-outcome objective, reinforcement learning 💡 [&#8230;]]]></description>
										<content:encoded><![CDATA[<h3>1. Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Skill1, skill selection, skill library, task-outcome objective, reinforcement learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Develop Skill1, a unified framework that trains a single policy for skill selection, utilization, and distillation, achieving superior performance in complex tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Skill1 framework co-evolves skills by generating queries, re-ranking skill library candidates, solving tasks, and distilling new skills based on a shared task-outcome objective.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Skill1 outperforms existing skill-based and reinforcement learning models in environments like ALFWorld and WebShop, with evidence showing the effective co-evolution of capabilities.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06130" target="_blank">https://huggingface.co/papers/2605.06130</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233008131.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>2. Continuous Latent Diffusion Language Model</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Hierarchical Latent Diffusion Language Model, Non-autoregressive Inductive Bias, Global Semantic Prior, Text Generation, Scaling Behavior</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To propose Cola DLM, a hierarchical latent diffusion language model, as a principled alternative for efficient text generation, moving beyond traditional autoregressive paradigms.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilized a hierarchical information decomposition approach with text-to-latent mapping, global semantic prior modeling, and conditional decoding.</p>
<p>   &#8211; Conducted experiments with comparisons to autoregressive and LLaDA baselines across 8 benchmarks and 4 research questions.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Cola DLM successfully demonstrates flexible non-autoregressive inductive bias, supports semantic compression, and extends across continuous modalities.</p>
<p>   &#8211; The results suggest hierarchical continuous latent prior modeling may offer superior generation quality and scaling behavior than token-level approaches, pointing towards unified modeling for discrete and continuous modalities.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06548" target="_blank">https://huggingface.co/papers/2605.06548</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233036972.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>3. RaguTeam at SemEval-2026 Task 8: Meno and Friends in a Judge-Orchestrated LLM Ensemble for Faithful Multi-Turn Response Generation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Heterogeneous Ensemble, Large Language Models, GPT-4o-mini, AI-Generated Summary, Domain-Adapted Model</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aimed to achieve top performance in the SemEval-2026 MTRAGEval task using a diverse ensemble of language models with dual prompting strategies.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; A heterogeneous ensemble of seven large language models was employed, featuring different prompting variants. A GPT-4o-mini judge selected the best candidate per instance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The model diversity was crucial for performance, consistently outperforming any single model. Introduction of Meno-Lite-0.1 highlighted a strong cost-performance trade-off, and the study provided insights into MTRAGEval annotation limitations and potential improvements.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.04523" target="_blank">https://huggingface.co/papers/2605.04523</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233059492.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>4. When to Trust Imagination: Adaptive Action Execution for World Action Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: World Action Models, Future-Reality Verification, Adaptive WAM Execution, Future Forward Dynamics Causal Attention, Mixture-of-Horizon Training</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To enhance the execution of World Action Models (WAMs) by introducing adaptive mechanisms that ensure the predicted future remains consistent with real-world observations.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implementation of Future Forward Dynamics Causal Attention (FFDC) to verify consistency between predicted and real observations, enabling adaptive action execution.</p>
<p>   &#8211; Introduction of Mixture-of-Horizon Training to improve the coverage of long-horizon trajectories in robotic manipulation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed method improves the robustness-efficiency trade-off, reducing WAM forward passes and execution time while increasing the success rate in both benchmark and real-world experiments.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06222" target="_blank">https://huggingface.co/papers/2605.06222</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233130481.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>5. SkillOS: Learning Skill Curation for Self-Evolving Agents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: SkillOS, self-evolving agents, skill curation, composite rewards, SkillRepo</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to develop SkillOS, a reinforcement learning framework for enabling self-evolving LLM agents to learn complex long-term skill curation policies that improve performance across diverse tasks and executor architectures.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; SkillOS pairs a frozen agent executor with a trainable skill curator, utilizing composite rewards and grouped task streams to learn from skill-relevant task dependencies, updating an external SkillRepo based on experience.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; SkillOS consistently surpasses both memory-free and memory-based baselines in effectiveness and efficiency, with learnings from the skill curator generalizing well across different executor backbones and task domains, producing more targeted skill use and evolving higher-level meta-skills.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06614" target="_blank">https://huggingface.co/papers/2605.06614</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233153697.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>6. Auto Research with Specialist Agents Develops Effective and Non-Trivial Training Recipes</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: empirical loop, lineage feedback, specialist agents, program-level recipe edits, AI-generated summary</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Investigate an empirical loop that autonomously refines code through external evaluation feedback without human intervention.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilize specialist agents to create trials that include code edits and evaluations, iterating over an auditable trajectory of proposals and experiments.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Demonstrated significant improvements in various tasks such as Parameter Golf validation, NanoChat-D12 CORE, and CIFAR-10 Airbench96 wallclock time without human proposal or intervention.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.05724" target="_blank">https://huggingface.co/papers/2605.05724</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233245595.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>7. Audio-Visual Intelligence in Large Foundation Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Audio-Visual Intelligence, large foundation models, multimodal data, cross-modal fusion, Audio-Visual Intelligence</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to provide a comprehensive review of Audio-Visual Intelligence (AVI) through the lens of large foundation models, establishing a unified taxonomy for understanding, generation, and interaction tasks within this multidisciplinary field.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The survey synthesizes methodological foundations such as modality tokenization, cross-modal fusion, autoregressive and diffusion-based generation, and large-scale pretraining, among others, to structure and integrate diverse tasks and practices in AVI.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; A coherent framework is established offering structured comparisons across task families, identifying open challenges in synchronization, spatial reasoning, controllability, and safety. The survey underscores the importance of unified audio-vision architectures for future research in large-scale AVI.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.04045" target="_blank">https://huggingface.co/papers/2605.04045</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233221069.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>8. Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: ScaleLogic, Reinforcement Learning, Logical Reasoning, Scaling Exponent, Curriculum-Based Training</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective is to systematically study the scaling of reinforcement learning training compute with task difficulty using a new framework, ScaleLogic.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduced ScaleLogic, a synthetic logical reasoning framework, allowing independent control over reasoning depth and logical expressiveness. Analysis conducted on the scaling of reinforcement learning with these factors.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Reinforcement Learning compute scales by a power law with reasoning depth, with scaling exponent increasing with logical expressiveness. More expressive training leads to larger performance gains and compute efficiency in downstream tasks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06638" target="_blank">https://huggingface.co/papers/2605.06638</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233307773.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>9. ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: ReflectDrive-2, autonomous driving, discrete diffusion planner, reinforcement learning, trajectory revision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study introduces ReflectDrive-2, aiming to improve autonomous driving by enabling efficient trajectory revision through a masked discrete diffusion planner and parallel decoding.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes a two-stage training approach, combining structure-aware perturbations to refine trajectories and reinforcement learning to enhance trajectory revision and decision-making.</p>
<p>   &#8211; Implements a decision&#8211;draft&#8211;reflect pipeline co-designed with a reflective decoding stack to optimize performance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; ReflectDrive-2 demonstrates significant improvement in PDMS performance, reaching 91.0 with camera-only input and 94.8 in an oracle setup using NAVSIM, with a low average latency of 31.8 ms on NVIDIA Thor.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.04647" target="_blank">https://huggingface.co/papers/2605.04647</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233334585.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>10. SwiftI2V: Efficient High-Resolution Image-to-Video Generation via Conditional Segment-wise Generation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: SwiftI2V, High-resolution I2V, Conditional Segment-wise Generation, bidirectional contextual interaction, token budget  </p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models  </p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The main objective of this research is to develop an efficient high-resolution image-to-video (I2V) generation framework that addresses existing challenges and achieves scalable, input-faithful video synthesis with reduced computational requirements.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; SwiftI2V employs a two-stage design with Conditional Segment-wise Generation to synthesize videos segment-by-segment, using a bounded per-step token budget to improve efficiency.</p>
<p>   &#8211; The approach utilizes bidirectional contextual interaction to enhance cross-segment coherence and input fidelity.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; SwiftI2V significantly reduces the computational load, achieving a 202x reduction in total GPU-time, and enables practical 2K I2V generation on both datacenter and consumer GPUs, maintaining performance comparable to end-to-end baselines.  </p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06356" target="_blank">https://huggingface.co/papers/2605.06356</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260508233355994.mp4"></video> </figure>
</p>
</div>
<div style='height:30px'></div>
<h3>11. RemoteZero: Geospatial Reasoning with Zero Human Annotations</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: RemoteZero, Geospatial Reasoning, MLLM, Self-Evolution, Semantic Verification</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Knowledge Representation and Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce RemoteZero to enable geospatial reasoning without box supervision, utilizing semantic verification capabilities of MLLMs to improve localization from unlabeled remote sensing data.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Leverage the discriminative ability of MLLMs for semantic verification instead of traditional geometric supervision to facilitate GRPO training without box annotations.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; RemoteZero achieves competitive performance against strong supervised methods, showcasing the potential of self-verifying training for geospatial reasoning and localization.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.04451" target="_blank">https://huggingface.co/papers/2605.04451</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233423683.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>12. The Scaling Properties of Implicit Deductive Reasoning in Transformers</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Deep Transformers, bidirectional masking, implicit deductive reasoning, Horn clauses, algorithmic alignment</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Knowledge Representation and Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To investigate the scaling properties of implicit deductive reasoning in depth-bounded Transformers using bidirectional masking.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Systematic decorrelation of provability from spurious features and enforcing algorithmic alignment in deep models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Implicit reasoning in sufficiently deep models with bidirectional prefix masking can closely match explicit chain-of-thought performance across various graph structures and problem sizes. However, explicit chain-of-thought methods remain necessary for depth extrapolation.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.04330" target="_blank">https://huggingface.co/papers/2605.04330</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233509206.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>13. Prescriptive Scaling Laws for Data Constrained Training</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Data-Constrained Regimes, Overfitting Penalty, Compute-Optimal Allocation, Weight Decay, Scaling Law</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research objective is to modify the Chinchilla scaling law to account for data repetition effects and provide compute-optimal training strategies in data-constrained scenarios.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The methods involve modeling excess loss under data repetition with an additive overfitting penalty, allowing for adjustments in compute allocation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The study finds that further data repetition is counterproductive after a certain point, and resources are better allocated to model capacity. The research also concludes that strong weight decay significantly reduces the overfitting coefficient, aligning with recent findings in data-constrained regimes.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.01640" target="_blank">https://huggingface.co/papers/2605.01640</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233446582.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>14. PianoCoRe: Combined and Refined Piano MIDI Dataset</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: PianoCoRe, music information retrieval, MIDI, note-level alignment, expressive performance modeling</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study introduces PianoCoRe, a large-scale piano MIDI dataset, to enhance applications in music information retrieval by providing diverse performances and note-level alignments.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The authors curated and refined major open-source piano corpora, resulting in a dataset with 250,046 performances and tiered subsets to support various applications. They also developed a MIDI quality classifier and the RAScoP alignment refinement pipeline.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; PianoCoRe reduces temporal noise and eliminates tempo outliers, showing improved robustness in expressive performance modeling compared to models trained on smaller datasets. This positions PianoCoRe as a comprehensive resource for future piano performance research.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06627" target="_blank">https://huggingface.co/papers/2605.06627</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233600824.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>15. Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Multimodal Domain Generalization, AI-generated summary, action recognition, sentiment analysis, neural networks</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduction of MMDG-Bench, a unified benchmark for evaluating Multimodal Domain Generalization (MMDG) across diverse tasks and modalities.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Standardized evaluation using six datasets across three tasks: action recognition, mechanical fault diagnosis, and sentiment analysis. It involves six modality combinations and nine representative methods, with systematic assessment criteria including corruption robustness and missing-modality generalization.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Specialized MMDG methods show only marginal improvements over the ERM baseline.</p>
<p>   &#8211; No single method consistently outperforms across all settings, with a significant performance gap remaining.</p>
<p>   &#8211; Trimodal fusion does not consistently surpass bimodal configurations.</p>
<p>   &#8211; All methods suffer notably under corruption and missing-modality scenarios, affecting model trustworthiness.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06643" target="_blank">https://huggingface.co/papers/2605.06643</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233534467.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>16. EDU-CIRCUIT-HW: Evaluating Multimodal Large Language Models on Real-World University-Level STEM Student Handwritten Solutions</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: MLLMs, handwritten solutions, auto-grading, upstream recognition, AI-enabled grading system</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI in Education</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to evaluate the capabilities of Multimodal Large Language Models (MLLMs) in interpreting complex STEM handwritten student solutions to improve educational grading systems.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Released EDU-CIRCUIT-HW dataset with 1,300+ authentic handwritten student solutions. </p>
<p>   &#8211; Conducted evaluations using expert-verified transcriptions and grading reports, assessing both upstream recognition fidelity and downstream auto-grading performance of various MLLMs.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; MLLMs exhibit significant limitations in understanding complex handwritten logic, affecting their reliability for auto-grading.</p>
<p>   &#8211; A proposed hybrid approach, combining error detection with minimal human oversight, can enhance AI-enabled grading robustness.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2602.00095" target="_blank">https://huggingface.co/papers/2602.00095</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233718055.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>17. Generative Quantum-inspired Kolmogorov-Arnold Eigensolver</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Quantum Chemistry, Kolmogorov-Arnold eigensolver, Quantum-inspired, HPC, Strongly Correlated Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Quantum Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to reduce classical computational overhead in quantum chemistry workflows while maintaining accuracy and improving convergence for strongly correlated systems using a generative quantum-inspired Kolmogorov-Arnold eigensolver (GQKAE).</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; GQKAE is a parameter-efficient extension of the generative quantum eigensolver, replacing parameter-heavy networks with hybrid quantum-inspired Kolmogorov-Arnold modules. It utilizes single-qubit DatA Re-Uploading ActivatioN modules for expressive mappings.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; GQKAE achieves chemical accuracy comparable to GPT-based architectures while reducing trainable parameters and memory by approximately 66%. It also enhances convergence and final energy errors for strongly correlated systems like N2 and LiH, offering a scalable approach for HPC-quantum co-design on near-term quantum platforms.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.04604" target="_blank">https://huggingface.co/papers/2605.04604</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233651955.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>18. Sparkle: Realizing Lively Instruction-Guided Video Background Replacement via Decoupled Guidance</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Background Replacement, Video Editing, Foreground-Background Interactions, Data Synthesis, Evaluation Benchmark</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce a new dataset and benchmark for background replacement in video editing, addressing the limitations in existing datasets with a scalable pipeline.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Designed a pipeline that decouples foreground and background guidance with strict quality filtering to generate high-quality datasets.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Sparkle dataset and the model trained on it show substantially better performance than existing baselines on both OpenVE-Bench and Sparkle-Bench, filling a significant gap in background replacement tasks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06535" target="_blank">https://huggingface.co/papers/2605.06535</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233626384.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>19. TIDE: Every Layer Knows the Token Beneath the Context</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: TIDE, EmbeddingMemory, Rare Token Problem, Contextual Collapse Problem</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to address limitations in large language model (LLM) design by introducing TIDE to mitigate the Rare Token and Contextual Collapse Problems.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; TIDE augments the transformer with an EmbeddingMemory system which involves independent MemoryBlocks that map token indices to context-free semantic vectors, injected at each layer through a depth-conditioned softmax router.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; TIDE effectively addresses issues related to single-token identity injection, enhancing performance in language modeling and various downstream tasks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06216" target="_blank">https://huggingface.co/papers/2605.06216</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233740135.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>20. </h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="" target="_blank"></a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/thirteen/202605081778283468.jpg"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>21. BioTool: A Comprehensive Tool-Calling Dataset for Enhancing Biomedical Capabilities of Large Language Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Biomedical tool-calling, Large Language Model, AI Native, Fine-tuning, BioTool</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI in Healthcare</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop BioTool, a comprehensive biomedical tool-calling dataset aimed at improving the performance of large language models (LLMs) in specialized biomedical domains.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Fine-tuning a 4-billion-parameter LLM on the BioTool dataset, which includes 34 tools and 7,040 human-verified API call pairs covering areas like genomics and proteomics.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; BioTool fine-tuning significantly boosts LLM performance in biomedical tool-calling, surpassing commercial alternatives like GPT-5.1, and enhances downstream answer quality as per human expert evaluations.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.05758" target="_blank">https://huggingface.co/papers/2605.05758</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233729267.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>22. Recovering Hidden Reward in Diffusion-Based Policies</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: EnergyFlow, inverse reinforcement learning, reward extraction, policy generalization, structural constraints</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper introduces EnergyFlow, a framework designed to unify generative action modeling with inverse reinforcement learning to enhance reward extraction and policy generalization.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The authors parameterize a scalar energy function with its gradient acting as a denoising field, facilitating reward extraction without adversarial training.</p>
<p>   &#8211; The framework capitalizes on maximum-entropy optimality and denoising score matching to recover expert&#8217;s soft Q-function gradients.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; EnergyFlow achieves state-of-the-art performance in imitation tasks and provides a robust reward signal for reinforcement learning, outperforming traditional IRL methods.</p>
<p>   &#8211; Structural constraints aid in reducing hypothesis complexity and improving out-of-distribution generalization, serving as inductive biases for policy generalization.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.00623" target="_blank">https://huggingface.co/papers/2605.00623</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233707212.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>23. When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Safety Scoring, Scenario-Based Audit, Instrumental-Validity Chain, AUROC, Local-First Scoring</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to establish a method for benchmarkless comparative safety scoring to evaluate language model safety in the absence of pre-existing labeled benchmarks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes scenario-based audits and an instrumental-validity chain to assess responsiveness, variance dominance, and stability. Demonstrated with a local-first scoring instrument, SimpleAudit, validated on a Norwegian safety pack.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The research demonstrates that in the Norwegian public-sector case, safety assessments vary based on scenario category and risk measure. It stresses that scores and related metrics must be reported collectively for meaningful deployment evidence.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06652" target="_blank">https://huggingface.co/papers/2605.06652</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233637213.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>24. GeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMs</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: GeoStack, Vision-Language Models, domain experts, adapter manifold, catastrophic forgetting</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Address the challenge of knowledge composition in Vision-Language Models without causing catastrophic forgetting by using GeoStack.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduce a modular framework, GeoStack, which enforces geometric and structural constraints on adapter manifolds.</p>
<p>   &#8211; Demonstrate a weight-folding property that ensures O(1) inference complexity regardless of the number of domain experts integrated.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; GeoStack effectively provides long-term knowledge composition, significantly mitigating catastrophic forgetting while enabling efficient multi-domain adaptation and class-incremental learning.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06477" target="_blank">https://huggingface.co/papers/2605.06477</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233614303.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>25. Think, then Score: Decoupled Reasoning and Scoring for Video Reward Modeling</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: video reward models, Chain-of-Thought reasoning, decoupled think-then-score, reinforcement learning, multimodal large language models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to enhance the training efficiency and generalization of video reward models by decoupling the thinking and scoring processes.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The researchers introduced DeScore, a two-stage framework that includes a discriminative cold start with a random mask mechanism and dual-objective reinforcement learning.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; DeScore improves interpretability and generalization by using a &#8220;think-then-score&#8221; paradigm, enhancing the model&#8217;s reasoning quality and ensuring alignments with human preferences.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.05922" target="_blank">https://huggingface.co/papers/2605.05922</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233550070.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>26. Balanced Aggregation: Understanding and Fixing Aggregation Bias in GRPO</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Balanced Aggregation, Reinforcement Learning, Token-Level Policy Gradient, Training Stability, Final Performance</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To enhance reinforcement learning with verifiable rewards by addressing optimization biases in token-level policy gradient aggregation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implementation of Balanced Aggregation, computing token-level means within positive and negative subsets and combining them with sequence-count-based weights.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Balanced Aggregation improves training stability and performance over standard token and sequence aggregation, highlighting the critical role of aggregation in GRPO-style reinforcement learning.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.04077" target="_blank">https://huggingface.co/papers/2605.04077</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233522040.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>27. KernelBench-X: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: KernelBench-X, Triton kernel generation, correctness, iterative refinement, hardware efficiency</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To investigate where LLM-generated Triton kernel generation capability breaks down and why, utilizing the KernelBench-X benchmark.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Systematic comparison of five representative methods across 176 tasks in 15 categories focusing on category-aware evaluation of correctness and hardware efficiency.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Task structure affects kernel correctness more than method design.</p>
<p>   &#8211; Iterative refinement improves correctness but at the cost of performance.</p>
<p>   &#8211; Correctness does not guarantee efficiency; significant variance in kernel performance.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.04956" target="_blank">https://huggingface.co/papers/2605.04956</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233457222.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>28. The Granularity Axis: A Micro-to-Macro Latent Direction for Social Roles in Language Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Large language models (LLMs), social roles, Granularity Axis, hidden states, activation steering</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Investigate whether large language models encode the granularity of social roles, ranging from individual to organizational levels, in their internal representations.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Define a contrast-based Granularity Axis to analyze role representation space and perform projections of role-level hidden states.</p>
<p>   &#8211; Conduct experiments with Qwen3-8B and Llama-3.1-8B-Instruct, constructing 75 social roles across five granularity levels and collecting 91,200 role-conditioned responses.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The study finds that social role granularity is a structured and causally manipulable latent direction in the behavior of language models.</p>
<p>   &#8211; Activation steering along the Granularity Axis effectively shifts response granularity, with differences in controllability between models suggesting variability in default operating regimes.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06196" target="_blank">https://huggingface.co/papers/2605.06196</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233434788.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>29. AI Co-Mathematician: Accelerating Mathematicians with Agentic AI</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AI co-mathematician, mathematical workflows, theorem proving, stateful workspace, problem-solving benchmarks</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To create an interactive platform, AI co-mathematician, that assists mathematicians in open-ended research utilizing AI agents.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Provides a holistic support system for workflows including ideation, literature search, theorem proving, and theory building through a stateful, asynchronous workspace.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The AI co-mathematician successfully aids researchers in solving open problems, discovering new research directions, and identifying overlooked literature. It achieves state-of-the-art results, including a 48% score on FrontierMath Tier 4 benchmarks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06651" target="_blank">https://huggingface.co/papers/2605.06651</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233410723.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>30. TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: TabEmbed, Tabular Embedding Benchmark, Semantic Matching, Contrastive Learning, Universal Tabular Representation Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce TabEmbed, a generalist embedding model for unifying tabular classification and retrieval within a shared embedding space.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Employ large-scale contrastive learning with positive-aware hard negative mining to address tabular tasks reformulated as semantic matching problems.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; TabEmbed significantly outperforms existing text embedding models on TabBench, setting a new standard for universal tabular representation learning.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.04962" target="_blank">https://huggingface.co/papers/2605.04962</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233344943.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>31. UniPool: A Globally Shared Expert Pool for Mixture-of-Experts</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Mixture-of-Experts, Shared Expert Pool, Stable Routing, Parameter Growth, Depth Scaling</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The goal of this research is to introduce UniPool, a novel shared expert pool architecture for Mixture-of-Experts (MoE) models, aimed at reducing parameter growth with depth while maintaining or improving model performance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; UniPool employs a global shared pool of expert capacity, accessed via independent per-layer routers. The design incorporates stable and balanced training mechanisms, including a pool-level auxiliary loss and the NormRouter for scale-stable routing.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; UniPool consistently improves validation loss and perplexity over conventional MoE baselines across various model scales, demonstrating that expert parameters can grow sublinearly under a shared-pool architecture, enhancing efficiency and effectiveness without linear parameter expansion.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06665" target="_blank">https://huggingface.co/papers/2605.06665</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233322607.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>32. A^2TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Reinforcement Learning, Agentic LLMs, Information Gain, Policy Optimization, Adaptive Turn-level Clipping</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To improve policy optimization for agentic large language models (LLMs) experiencing sparse rewards and credit assignment issues through a novel approach, A²TGPO.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes Information Gain as an intrinsic process signal while introducing turn-group normalization, variance-rescaled discounted accumulation, and adaptive turn-level clipping to optimize policy updates.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed A²TGPO method effectively redesigns the process of information gain normalization, accumulation, and clipping to better evaluate and optimize policy in reinforcement learning for agentic LLMs.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06200" target="_blank">https://huggingface.co/papers/2605.06200</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233256724.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>33. StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Strategic Trajectory Abstraction, trajectory-level strategy, reinforcement learning, sample efficiency, final performance</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To introduce a framework named Strategic Trajectory Abstraction (StraTA) to enhance long-horizon decision making in large language models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implementation of StraTA that uses a combination of trajectory-level strategy and hierarchical GRPO-style rollout design, with enhancements like diverse strategy rollout and critical self-judgment.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; StraTA significantly improves sample efficiency and performance, achieving high success rates of 93.1% on ALFWorld and 84.2% on WebShop, and a 63.5% overall score in SciWorld, surpassing existing strong baselines.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06642" target="_blank">https://huggingface.co/papers/2605.06642</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233232624.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>34. Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Reinforcement Learning, verifiable rewards, Large Language Models, zero-advantage problem, Lorem Perturbation for Exploration</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to address the zero-advantage problem in reinforcement learning when training Large Language Models by introducing a novel approach called Lorem Perturbation for Exploration (LoPE).</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The research proposes using prompt-space perturbations by prepending sequences from the Lorem Ipsum vocabulary to prompts to enhance exploration in model training across various model sizes (1.7B, 4B, and 7B).</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; LoPE significantly improves exploration success rates compared to conventional resampling methods, establishing itself as an effective baseline for enhancing exploration in LLM reinforcement learning applications.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.05566" target="_blank">https://huggingface.co/papers/2605.05566</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233209432.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>35. Continuous-Time Distribution Matching for Few-Step Diffusion Distillation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Continuous-Time Distribution Matching, diffusion model distillation, velocity field extrapolation, visual fidelity.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper introduces Continuous-Time Distribution Matching (CDM) to transition diffusion model distillation from discrete to continuous optimization.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Two continuous-time designs, including a dynamic continuous schedule and continuous-time alignment objective, are utilized to enforce distribution matching at arbitrary points on sampling trajectories.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; CDM enhances visual fidelity for few-step image generation without complex auxiliary objectives, as evidenced by experiments on architectures like SD3-Medium and Longcat-Image.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06376" target="_blank">https://huggingface.co/papers/2605.06376</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233142288.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>36. MARBLE: Multi-Aspect Reward Balance for Diffusion RL</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: MARBLE, multi-reward reinforcement learning, diffusion models, quadratic programming, policy gradients</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To address limitations in multi-reward reinforcement learning fine-tuning of diffusion models by leveraging a gradient-space optimization framework without manual reward weighting.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implementation of MARBLE, which maintains independent advantage estimators and harmonizes policy gradients through solving a quadratic programming problem. Introduces an amortized formulation to reduce computational costs while stabilizing updates with EMA smoothing.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; MARBLE improves alignment across all reward dimensions on a test model with five rewards, turning the worst-aligned reward&#8217;s gradient cosine consistently positive in the majority of mini-batches, while maintaining nearly baseline training speed.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06507" target="_blank">https://huggingface.co/papers/2605.06507</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233113317.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>37. MiA-Signature: Approximating Global Activation for Long-Context Understanding</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Mindscape Activation Signature, compressed representation, global activation pattern, long-context understanding, computational efficiency</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To propose a compressed representation method, termed as Mindscape Activation Signature (MiA-Signature), for approximating global activation states in large language models while retaining computational efficiency.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implementing MiA-Signature through submodular-based selection of high-level concepts, and refining it using lightweight iterative updates with the aid of working memory.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Integration of MiA-Signatures into RAG and agentic systems shows consistent performance gains in long-context understanding tasks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.06416" target="_blank">https://huggingface.co/papers/2605.06416</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233048443.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>38. Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Direct corpus interaction, Agentic search, Modern retrieval systems, BRIGHT and BEIR datasets, Multi-hop QA</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To improve agentic search effectiveness by enabling agents to directly query raw text, surpassing limitations of traditional retrieval methods.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study employs direct corpus interaction (DCI) using general-purpose terminal tools without reliance on embedding models or retrieval APIs. </p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Direct corpus interaction significantly outperforms existing retrieval methods across various IR benchmarks and agentic search tasks, offering a broader interface-design space for enhanced retrieval and reasoning ability.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.05242" target="_blank">https://huggingface.co/papers/2605.05242</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260508233020496.png"></figure>
</p>
</div>
<div style='height:30px'></div>
]]></content:encoded>
					
		
		<enclosure url="https://cdn.ainative.foundation/huggingface/20260508233355994.mp4" length="0" type="video/mp4" />

			</item>
		<item>
		<title>AI Native Daily Paper Digest &#8211; 20260507</title>
		<link>https://ainativefoundation.org/ai-native-daily-paper-digest-20260507/</link>
		
		<dc:creator><![CDATA[insights]]></dc:creator>
		<pubDate>Fri, 08 May 2026 00:41:02 +0000</pubDate>
				<category><![CDATA[Papers]]></category>
		<guid isPermaLink="false">https://ainativefoundation.org/ai-native-daily-paper-digest-20260507/</guid>

					<description><![CDATA[1. Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation 🔑 Keywords: Video Diffusion Model, Reliability, Perplexity, Distillation, Visual Quality 💡 Category: [&#8230;]]]></description>
										<content:encoded><![CDATA[<h3>1. Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Video Diffusion Model, Reliability, Perplexity, Distillation, Visual Quality</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Stream-R1 aims to enhance video diffusion model distillation by adaptively weighting supervision based on reliability and perplexity, improving visual quality, motion quality, and text alignment without extra computational cost.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduces a Reliability-Perplexity Aware Reward Distillation framework, Stream-R1, which adaptively reweights the objective across rollout and spatiotemporal-element levels through a reward-guided mechanism.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Stream-R1 consistently improves visual, motion quality, and text alignment in video generation compared to traditional distillation approaches, achieving this without architectural changes or additional inference costs.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.03849" target="_blank">https://huggingface.co/papers/2605.03849</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260507233010733.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>2. RLDX-1 Technical Report</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: RLDX-1, Multi-Stream Action Transformer, dexterous manipulation, modality integration, real-time deployment</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research introduces RLDX-1, a robotic policy designed for dexterous manipulation, to improve performance over existing Vision-Language-Action models in complex real-world tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilization of a Multi-Stream Action Transformer architecture to integrate heterogeneous modalities through modality-specific streams with cross-modal joint self-attention.</p>
<p>   &#8211; Implementation of system-level design choices, including synthetic training data for rare scenarios and optimization for real-time deployment.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; RLDX-1 outperforms recent Vision-Language-Action models in both simulation benchmarks and real-world tasks, particularly achieving high success rates in humanoid tasks, indicating its advancements in controlling high-DoF humanoid robots under diverse demands.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.03269" target="_blank">https://huggingface.co/papers/2605.03269</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260507233049164.mp4"></video> </figure>
</p>
</div>
<div style='height:30px'></div>
<h3>3. HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: HERMES++, 3D scene understanding, future geometry prediction, BEV representation, Large Language Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop HERMES++, a unified model integrating 3D scene understanding and future geometry prediction for autonomous driving.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes BEV representation to integrate multi-view spatial information.</p>
<p>   &#8211; Introduces LLM-enhanced queries and temporal linking to connect semantic and geometric understanding.</p>
<p>   &#8211; Employs joint geometric optimization to align model predictions with geometry-aware priors.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; HERMES++ shows superior performance over specialized methods in future point cloud prediction and 3D scene understanding.</p>
<p>   &#8211; The model and associated code will be open-sourced for further research and application.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.28196" target="_blank">https://huggingface.co/papers/2604.28196</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260507233135557.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>4. Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: BRIGHT-Pro, RTriever-Synth, Reasoning-intensive retrieval, Agentic search systems, LoRA fine-tuning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Knowledge Representation and Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce BRIGHT-Pro to expand expert-annotated reasoning-intensive retrieval benchmarks and RTriever-Synth as an aspect-decomposed synthetic corpus to enhance retriever performance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implement agentic search evaluation and LoRA fine-tuning on RTriever-4B from Qwen3-Embedding-4B using BRIGHT-Pro and RTriever-Synth.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Aspect-aware and agentic evaluation methods reveal behaviors in retrievers not captured by standard metrics, and RTriever-4B shows substantial improvements over its base model.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.04018" target="_blank">https://huggingface.co/papers/2605.04018</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260507233206905.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>5. Lightning Unified Video Editing via In-Context Sparse Attention</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: In-Context Learning, Video Editing, Sparse Attention, Query Sharpness, LIVEditor</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study introduces In-context Sparse Attention (ISA), a framework designed to reduce computational costs and maintain visual quality in video editing using In-Context Learning paradigms.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The ISA approach includes a pre-selection strategy that prunes redundant context and a dynamic query grouping mechanism. This mechanism routes high-error queries to full attention and low-error queries to a computationally efficient 0-th order Taylor sparse attention.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; LIVEditor, a model built using ISA and a specialized video-editing data pipeline, achieves a 60% reduction in attention-module latency while surpassing current state-of-the-art methods without compromising visual fidelity.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.04569" target="_blank">https://huggingface.co/papers/2605.04569</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260507233245000.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>6. MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: MiniCPM-o 4.5, Omni-Flow, Real-time Streaming Interaction, Human-like Multimodal Interaction, Edge Devices</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; MiniCPM-o 4.5 aims to achieve real-time full-duplex multimodal interaction for more human-like AI engagement.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilization of Omni-Flow, a unified streaming framework, aligning inputs and outputs temporally to enable simultaneous perception and response.</p>
<p>   &#8211; Implementation involves an architecture design optimized for efficiency and low RAM cost on edge devices.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; MiniCPM-o 4.5 demonstrates superior performance in omni-modal understanding and speech generation, outperforming existing models like Qwen3-Omni-30B-A3B.</p>
<p>   &#8211; The model exhibits proactive behavior in a multimodal environment and maintains computation efficiency with higher performance capabilities at its parameter scale.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.27393" target="_blank">https://huggingface.co/papers/2604.27393</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260507233324549.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>7. ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: ResRL, Large Language Models (LLMs), negative sample projection, diversity, reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective of this research is to improve the reasoning capabilities of Large Language Models (LLMs) while maintaining diversity by decoupling semantic distributions between positive and negative responses using a method called ResRL.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The method involves applying Residual Reinforcement Learning (ResRL) which uses negative sample projection to decouple similar semantic distributions among responses. This is achieved by linking Lazy Likelihood Displacement to mitigate head-gradient interference and using SVD-based low-rank positive subspace projection to enhance projection residuals.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; ResRL improves reasoning capabilities while preserving response diversity, outperforming existing methods on various benchmarks, including notable improvements in mathematical reasoning surpassing NSR&#8217;s performance on Avg@16 and Pass@128 scores.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.00380" target="_blank">https://huggingface.co/papers/2605.00380</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260507233403487.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>8. APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AI-generated music, Aesthetic Quality, Multi-task learning, Frozen Audio Embeddings, Music Popularity Prediction</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Develop APEX, a large-scale multi-task learning framework for predicting both popularity and aesthetic quality of AI-generated music.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilized frozen audio embeddings from MERT, a self-supervised music understanding model.</p>
<p>   &#8211; Trained on a dataset comprising over 211k songs from Suno and Udio.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; APEX demonstrates strong generalization across different generative architectures.</p>
<p>   &#8211; Incorporating aesthetic features improves preference prediction in out-of-distribution evaluations involving human preference battles.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.03395" target="_blank">https://huggingface.co/papers/2605.03395</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260507233442630.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>9. MedSkillAudit: A Domain-Specific Audit Framework for Medical Research Agent Skills</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Medical Research Agent Skills, Audit Framework, Expert Review, AI in Healthcare, MedSkillAudit</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI in Healthcare</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop and evaluate a domain-specific audit framework for assessing the reliability and readiness of medical research agent skills in healthcare applications.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Developed MedSkillAudit, a layered framework assessing skill release readiness.</p>
<p>   &#8211; Evaluated 75 skills across five medical research categories.</p>
<p>   &#8211; Compared system-expert agreement using ICC and Cohen&#8217;s kappa with human inter-rater baseline.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; MedSkillAudit demonstrated higher reliability than human inter-rater assessments with an ICC of 0.449.</p>
<p>   &#8211; Highlighted the practical benefits of domain-specific pre-deployment audits for medical research agent skills.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.20441" target="_blank">https://huggingface.co/papers/2604.20441</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260507233517131.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>10. When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Autoregressive Models, SxS Interleaved Reasoning, AI-generated Summary, Private Reasoning, Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to implement Side-by-Side (SxS) Interleaved Reasoning in autoregressive models to improve the accuracy and efficiency by controlling disclosure timing.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The approach involves constructing entailment-aligned interleaved trajectories using answer prefixes matched with supporting reasoning prefixes, and training with SFT (Supervised Fine-Tuning) and RL (Reinforcement Learning).</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The SxS Interleaved Reasoning method improves the accuracy-content-latency Pareto trade-offs in various benchmarks, enhancing performance under the new format across different architectures and domains.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.03314" target="_blank">https://huggingface.co/papers/2605.03314</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260507233552184.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>11. KinDER: A Physical Reasoning Benchmark for Robot Learning and Planning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: KinDER, physical reasoning, kinematic constraints, dynamic constraints, reinforcement learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce KinDER as a benchmark for Kinematic and Dynamic Embodied Reasoning, focusing on physical reasoning challenges in robot learning and planning.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilized procedurally generated environments and baselines across multiple learning paradigms, including motion planning, imitation learning, reinforcement learning, and foundation-model-based approaches.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Empirical evaluation reveals that current methods struggle with many environments, highlighting substantial gaps in existing physical reasoning approaches. Additionally, real-to-sim-to-real experiments assess simulation and real-world interaction correspondence, with KinDER being open-sourced for systematic comparison in robotics.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.25788" target="_blank">https://huggingface.co/papers/2604.25788</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260507233634922.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>12. Turning Drift into Constraint: Robust Reasoning Alignment in Non-Stationary Environments</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Autonomous Preference Optimization, reasoning alignment, multi-modal large language models, constraint-aware optimization, concept drift</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To address reasoning alignment challenges in multi-modal large language models under conditions of concept drift, improving robustness and performance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Proposed a novel framework named Autonomous Preference Optimization (APO), treating inter-model divergences as dynamic negative constraints with a two-stage protocol involving supervised bootstrapping and constraint-aware optimization.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Demonstrated superior robustness and accuracy in chest X-ray interpretation with the APO framework, outperforming proprietary source models. Released the CXR-MAX benchmark to support further research.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2510.04142" target="_blank">https://huggingface.co/papers/2510.04142</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260507233721348.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>13. </h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="" target="_blank"></a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/thirteen/202605071778197052.jpg"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>14. TT4D: A Pipeline and Dataset for Table Tennis 4D Reconstruction From Monocular Videos</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: 3D ball trajectories, learned lifting network, monocular broadcast videos, AI-generated summary, table tennis dataset</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Develop a high-fidelity table tennis dataset (TT4D) reconstructed from monocular broadcast videos to enable virtual replay, player analysis, and robot learning.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilize a novel reconstruction pipeline using a learned lifting network to transform unsegmented 2D ball tracks into 3D trajectories, aiding in time segmentation and spin estimation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The pipeline reliably reconstructs 3D ball trajectories and spin, even under high occlusion, and uniquely supports estimating racket pose &amp; velocity and generative modeling of rallies from general-view videos.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.01234" target="_blank">https://huggingface.co/papers/2605.01234</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260507233702889.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>15. CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: large language models, creative problem-solving, affordance reasoning, affordance-based creativity, CreativityBench</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Investigate the creative problem-solving abilities of large language models, particularly in novel tool usage through affordance reasoning.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduced CreativityBench, a benchmark for evaluating affordance-based creativity.</p>
<p>   &#8211; Developed a large-scale affordance knowledge base with 4,000 entities and 150,000+ annotations.</p>
<p>   &#8211; Constructed 14,000 grounded tasks requiring physically plausible solutions under constraints.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Current models can select plausible objects but struggle with identifying specific parts and underlying physical mechanisms.</p>
<p>   &#8211; Scaling models does not significantly enhance creative affordance discovery, indicating limits in current AI capabilities for novel tool use.</p>
<p>   &#8211; CreativityBench offers a valuable platform for examining this aspect of intelligence, with implications for future AI planning and reasoning modules.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.02910" target="_blank">https://huggingface.co/papers/2605.02910</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260507233613431.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>16. The First Token Knows: Single-Decode Confidence for Hallucination Detection</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: first-token confidence, phi_first, semantic self-consistency, hallucinations, AUROC</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To evaluate the effectiveness of first-token confidence (phi_first) in detecting hallucinations compared to semantic self-consistency and other methods.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Comparison of phi_first calculated from the normalized entropy of top-K logits during greedy decode with semantic self-consistency across three instruction-tuned models and two benchmarks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Phi_first matches or exceeds semantic self-consistency in performance for closed-book short-answer factual question answering, showing it as a computationally efficient alternative.</p>
<p>   &#8211; Phi_first presents a strong correlation with semantic agreement, suggesting that initial token distribution captures sufficient uncertainty information.</p>
<p>   &#8211; Recommends phi_first as a baseline before utilizing sampling-based uncertainty estimation methods.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.05166" target="_blank">https://huggingface.co/papers/2605.05166</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260507233534936.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>17. Diffusion Model as a Generalist Segmentation Learner</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Diffusion models, Semantic segmentation, Open-vocabulary segmentation, Latent space, CLIP-aligned text pathway</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to demonstrate that pretrained diffusion models can be adapted for semantic and open-vocabulary segmentation tasks across diverse domains.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes a technique called DiGSeg (Diffusion Models as a Generalist Segmentation Learner), encoding input images and masks into latent space and using a diffusion U-Net. Integrates a parallel CLIP-aligned text pathway for language feature alignment.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The research shows that modern diffusion backbones can serve as generalist segmentation learners, achieving state-of-the-art performance without the need for domain-specific architectural customization, thereby narrowing the gap between visual generation and understanding.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.24575" target="_blank">https://huggingface.co/papers/2604.24575</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260507233458643.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>18. SWE-WebDevBench: Evaluating Coding Agent Application Platforms as Virtual Software Agencies</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: SWE-WebDev Bench, AI-powered application development, Vibe Coding, Architectural Decisions</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study introduces SWE-WebDev Bench to evaluate AI-powered application development platforms on multiple dimensions such as requirement understanding, architectural decision-making, and production readiness.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Evaluation was conducted using a 68-metric framework spanning 25 primary and 43 diagnostic metrics across seven groups. The study assessed six platforms across three domains with a focus on interaction mode, agency angle, and complexity tier.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Four main shortcomings in current AI app builders were highlighted: specification bottleneck, frontend-backend decoupling, steep production-readiness cliff, and security and infrastructure failures. SWE-WebDev Bench was released for community use to address these gaps.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.04637" target="_blank">https://huggingface.co/papers/2605.04637</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260507233427066.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>19. Parameter-Efficient Multi-View Proficiency Estimation: From Discriminative Classification to Generative Feedback</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Multi-view Proficiency Estimation, SkillFormer, PATS, ProfVLM, Interpretable Feedback Generation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to estimate how well a person performs an action rather than identifying the action itself, which is vital for coaching, rehabilitation, and talent identification.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduced three methods for multi-view proficiency estimation: SkillFormer, PATS, and ProfVLM, leveraging parameter-efficient architectures, improved temporal sampling, and reformulated proficiency estimation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The methods demonstrate state-of-the-art accuracy on Ego-Exo4D with significant reductions in parameters and training epochs. They highlight a shift toward efficient, multi-view systems that prioritize selective fusion, proficiency-aware sampling, and provide actionable generative feedback.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.03848" target="_blank">https://huggingface.co/papers/2605.03848</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260507233339646.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>20. Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: JoyAI-Image, Multimodal Large Language Model, Multimodal Diffusion Transformer, Spatial Intelligence, Controllable Visual Synthesis</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; JoyAI-Image aims to achieve unified visual understanding, text-to-image generation, and instruction-guided image editing with enhanced spatial intelligence.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The model integrates a spatially enhanced Multimodal Large Language Model (MLLM) with a Multimodal Diffusion Transformer (MMDiT), employing a scalable training strategy that includes unified instruction tuning, long-text rendering supervision, and spatially grounded data.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; JoyAI-Image shows state-of-the-art or highly competitive performance across various tasks, enhancing geometry-aware reasoning and controllable visual synthesis, and offers a promising path for applications in vision-language-action systems and world models.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.04128" target="_blank">https://huggingface.co/papers/2605.04128</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260507233302773.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>21. D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: D-OPSD, Supervised Fine-tuning, Few-step Inference, On-policy Learning, Multimodal Features</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop a novel training approach, D-OPSD, that facilitates efficient supervised fine-tuning of diffusion models while preserving few-step inference capabilities.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implementing on-policy self-distillation by having the model act as both teacher and student with distinct contexts, utilizing text and multimodal features during training.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; D-OPSD enables diffusion models to learn new concepts and styles without compromising their original few-step inference capacity.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.05204" target="_blank">https://huggingface.co/papers/2605.05204</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260507233221705.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>22. PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: PhysForge, Hierarchical Physical Blueprint, physics-grounded diffusion model, KineVoxel Injection</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To overcome the limitations of static geometry by creating interactively functional 3D assets through physics-grounded synthesis.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Developed a decoupled two-stage framework including a Visual-Language Model (VLM) for planning and a physics-grounded diffusion model for synthesis, supported by a large-scale dataset, PhysDB.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; PhysForge successfully produces functionally plausible, simulation-ready 3D assets, enhancing the potential for interactive virtual worlds and embodied AI.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.05163" target="_blank">https://huggingface.co/papers/2605.05163</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260507233148513.mp4"></video> </figure>
</p>
</div>
<div style='height:30px'></div>
<h3>23. OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: OpenSearch-VL, multimodal search agents, reinforcement learning, Wiki path sampling, tool environment</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop an open-source framework for training advanced multimodal search agents using reinforcement learning, emphasizing the creation of high-quality data and new training algorithms.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduced a pipeline for data curation using techniques like Wikipedia path sampling and source-anchor visual grounding.</p>
<p>   &#8211; Designed a diverse tool environment that integrates text and image search, as well as various image processing techniques.</p>
<p>   &#8211; Developed a multi-turn fatal-aware GRPO training algorithm to manage tool failures effectively.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; OpenSearch-VL successfully enhances performance with over 10-point average improvements across multiple benchmarks, achieving results comparable to proprietary models. The release of data, code, and models supports open research in this domain.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.05185" target="_blank">https://huggingface.co/papers/2605.05185</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260507233119127.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>24. Stream-T1: Test-Time Scaling for Streaming Video Generation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Streaming Video Generation, Temporal Guidance, Diffusion Models, TTS, Temporal Dependency</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to improve test-time video generation efficiency by addressing the structural bottlenecks of diffusion model-based methods through the introduction of Stream-T1, a streaming video generation framework.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The Stream-T1 framework includes three components: Stream-Scaled Noise Propagation for temporal dependency, Stream-Scaled Reward Pruning for optimal balance between aesthetics and coherence, and Stream-Scaled Memory Sinking for effective guidance of subsequent video streams.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Evaluations on video benchmarks indicate that Stream-T1 significantly enhances temporal consistency, motion smoothness, and frame-level visual quality compared to existing methods.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.04461" target="_blank">https://huggingface.co/papers/2605.04461</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260507233030673.png"></figure>
</p>
</div>
<div style='height:30px'></div>
]]></content:encoded>
					
		
		<enclosure url="https://cdn.ainative.foundation/huggingface/20260507233049164.mp4" length="0" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/huggingface/20260507233148513.mp4" length="0" type="video/mp4" />

			</item>
		<item>
		<title>AI Native Daily Paper Digest &#8211; 20260506</title>
		<link>https://ainativefoundation.org/ai-native-daily-paper-digest-20260506/</link>
		
		<dc:creator><![CDATA[insights]]></dc:creator>
		<pubDate>Thu, 07 May 2026 00:40:43 +0000</pubDate>
				<category><![CDATA[Papers]]></category>
		<guid isPermaLink="false">https://ainativefoundation.org/ai-native-daily-paper-digest-20260506/</guid>

					<description><![CDATA[1. ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration 🔑 Keywords: ARIS, cross-model adversarial collaboration, research harness, execution layer, assurance layer 💡 Category: [&#8230;]]]></description>
										<content:encoded><![CDATA[<h3>1. ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: ARIS, cross-model adversarial collaboration, research harness, execution layer, assurance layer</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To ensure reliable long-term research outcomes through the deployment of ARIS, an open-source research harness that utilizes cross-model adversarial collaboration.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; ARIS employs three architectural layers: execution, orchestration, and assurance, with features like Markdown-defined skills, model integrations, a persistent research wiki, and a three-stage verification process.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; ARIS successfully coordinates machine-learning research workflows and mitigates failure modes in long-horizon research by implementing a default configuration of cross-model adversarial collaboration.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.03042" target="_blank">https://huggingface.co/papers/2605.03042</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260506233010225.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>2. Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: distributional drift, multimodal models, supervised fine-tuning, reinforcement learning, Mixture-of-Experts</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To tackle distributional drift in large multimodal models by introducing PRISM, a novel pipeline that incorporates a distribution-alignment stage to improve model performance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes a black-box adversarial game approach between policy and MoE discriminator to provide disentangled corrective signals.</p>
<p>   &#8211; Curates additional high-fidelity supervision data for distribution alignment from Gemini 3 Flash, enriching SFT initialization.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; PRISM effectively enhances downstream reinforcement learning performance across various algorithms and benchmarks, achieving significant accuracy improvements, highlighting the efficacy of the distribution-alignment approach.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.28123" target="_blank">https://huggingface.co/papers/2604.28123</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260506233049600.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>3. HeavySkill: Heavy Thinking as the Inner Skill in Agentic Harness</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: HeavySkill, Complex Reasoning, Parallel Reasoning, Reinforcement Learning, Self-Evolving LLMs</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Knowledge Representation and Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Propose HeavySkill, a framework for internalizing complex reasoning as an intrinsic model skill, outperforming standard orchestration methods.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduce a two-stage pipeline incorporating parallel reasoning and summarization, validated through empirical studies across various domains.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; HeavySkill consistently surpasses traditional Best-of-N strategies and can be enhanced via reinforcement learning, enabling scalable, self-evolving models.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.02396" target="_blank">https://huggingface.co/papers/2605.02396</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260506233120874.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>4. PatRe: A Full-Stage Office Action and Rebuttal Generation Benchmark for Patent Examination</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Patent examination, legal reasoning, LLMs, benchmarks, multi-turn process</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Knowledge Representation and Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper introduces PatRe, a new benchmark designed to model the complete patent examination process as a dynamic multi-turn interaction, addressing previous benchmarks&#8217; limitations.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; PatRe utilizes 480 real-world cases and supports both oracle and retrieval-simulated evaluation settings to investigate performance differences between proprietary and open-source LLMs.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The study reveals insights into model performance, highlighting the potential and limitations of LLMs in legal reasoning and technical novelty assessment within patent examination. It also underscores task asymmetries between examiner analysis and applicant rebuttal.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.03571" target="_blank">https://huggingface.co/papers/2605.03571</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260506233152367.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>5. SVGS: Enhancing Gaussian Splatting Using Primitives with Spatially Varying Colors</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Gaussian Splatting, Spatially Varying Colors, Opacity, Novel View Synthesis, Geometric Reconstruction</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to improve multi-view reconstruction by introducing Spatially Varying Gaussian Splatting (SVGS) which enhances Gaussian primitives with spatially varying colors and opacity.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; SVGS utilizes spatially varying functions implemented via bilinear interpolation, movable kernels, and tiny neural networks within 2D Gaussian surfels.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; SVGS significantly enhances novel view synthesis and maintains high-quality geometric reconstruction, outperforming baseline methods on multiple datasets, especially with movable kernels demonstrating superior results.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2411.18966" target="_blank">https://huggingface.co/papers/2411.18966</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260506233231560.mp4"></video> </figure>
</p>
</div>
<div style='height:30px'></div>
<h3>6. Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Workspace Learning, File Dependencies, AI Agents, Cross-File Retrieval, Contextual Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective is to evaluate AI agents&#8217; capability in workspace learning focusing on managing large-scale file dependencies and assessing their performance compared to human benchmarks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Construction of realistic workspaces featuring multiple worker profiles and extensive file types, coupled with curated tasks requiring complex decision-making.</p>
<p>   &#8211; Introduction of Workspace-Bench and Workspace-Bench-Lite to facilitate in-depth performance evaluations of AI agents using both real-world file dependencies and a reduced-cost benchmarking subset.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Experimental results highlight significant performance gaps between AI agents and human capabilities in workspace learning, with agents achieving a maximum of 68.7% versus human performance of 80.7%.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.03596" target="_blank">https://huggingface.co/papers/2605.03596</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260506233312879.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>7. ESARBench: A Benchmark for Agentic UAV Embodied Search and Rescue</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Multimodal Large Language Models, Unmanned Aerial Vehicle, Search and Rescue, ESAR, AI Native</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce the Embodied Search and Rescue (ESAR) task and benchmark to evaluate Multimodal Large Language Model-driven UAVs in realistic SAR scenarios.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Develop ESARBench using Unreal Engine 5 and AirSim to create photorealistic environments with dynamic variables for simulation.</p>
<p>   &#8211; Construct a dataset of 600 tasks modeled on real-world disasters and propose comprehensive evaluation metrics.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Highlight challenges in ESAR, like spatial memory limitations and the balance between search efficiency and flight safety.</p>
<p>   &#8211; Establish ESARBench as a resource to advance research in the Embodied Search and Rescue domain.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.01371" target="_blank">https://huggingface.co/papers/2605.01371</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260506233448621.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>8. The TTS-STT Flywheel: Synthetic Entity-Dense Audio Closes the Indic ASR Gap Where Commercial and Open-Source Systems Fail</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Niche-domain Indic ASR, Text-to-Speech, Entity-Hit-Rate, LoRA Fine-Tune, EDSA Corpus</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To improve niche-domain Indic ASR performance through a self-contained Text-to-Speech and Speech-to-Text method.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study implements a TTSSTT flywheel approach, using open-source Indic TTS to synthesize entity-dense utterances and applying LoRA fine-tuning on existing models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed flywheel approach significantly increases EHR on Telugu test sets compared to existing open-source and commercial systems, demonstrating transferability to real speech.</p>
<p>   &#8211; Cross-language performance varies, with notable improvements in beta-Hi and beta-Ta but not in Hindi.</p>
<p>   &#8211; The SFR improvements through per-language LoRA are specifically effective for Telugu.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.03073" target="_blank">https://huggingface.co/papers/2605.03073</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260506233414869.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>9. StateSMix: Online Lossless Compression via Mamba State Space Models and Sparse N-gram Context Mixing</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: StateSMix, State Space Model, n-gram context mixing, arithmetic coding, BPE tokens</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The development of StateSMix, a self-contained lossless compression model that operates without pre-trained weights or external dependencies.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes a Mamba-style State Space Model trained online, combined with sparse n-gram context mixing and arithmetic coding. </p>
<p>   &#8211; Implements an entropy-adaptive scaling mechanism and OpenMP parallelization for efficiency.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; StateSMix achieves superior compression rates compared to existing methods, such as xz -9e, across various data sizes.</p>
<p>   &#8211; Establishes the State Space Model as a dominant force in compression, with n-gram tables providing additional gains.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.02904" target="_blank">https://huggingface.co/papers/2605.02904</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260506233346455.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>10. Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Reinforcement Learning, Large Language Models, Rollout Strategies, Generate-Filter-Control-Replay</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To analyze reinforcement learning post-training methods for large language models using a unified framework that focuses on rollout processes.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The introduction of a lifecycle taxonomy named Generate-Filter-Control-Replay (GFCR) to decompose rollout pipelines into four modular stages for systematic evaluation and improvement.</p>
<p>   &#8211; Synthesis of various methods including RL with verifiable rewards, process supervision, judge-based gating, and adaptive compute allocation frameworks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The study synthesizes current techniques and grounds the framework with case studies across different reasoning tasks such as math and code/SQL.</p>
<p>   &#8211; A diagnostic index is introduced to map rollout pathologies to GFCR modules, highlighting areas for improvement in building reproducible and trustworthy rollout pipelines.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.02913" target="_blank">https://huggingface.co/papers/2605.02913</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260506233521165.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>11. Skills-Coach: A Self-Evolving Skill Optimizer via Training-Free GRPO</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Skills-Coach, Automated Framework, Large Language Model, Task Generation, Skill Evolution</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To enhance the self-evolution of skills within LLM-based agents using Skills-Coach, addressing skill ecosystem fragmentation and achieving comprehensive competency coverage.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Development of four core modules: Task Generation, Lightweight Optimization, Comparative Execution, and Traceable Evaluation, for testing and optimizing skill capabilities.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Skills-Coach demonstrates significant performance improvements in skill capability across a wide range of categories, validating its potential to develop more robust and adaptive LLM-based agents.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.27488" target="_blank">https://huggingface.co/papers/2604.27488</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260506233601366.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>12. </h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="" target="_blank"></a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/thirteen/202605061778110589.jpg"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>13. How Fast Should a Model Commit to Supervision? Training Reasoning Models on the Tsallis Loss Continuum</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Tsallis q-logarithm, reinforcement learning, cold-start stalling, Gradient-Amplified RL, Posterior-Attenuated Fine-Tuning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to address the challenge of cold-start stalling in reinforcement learning from verifiable rewards by introducing a loss family J_Q that interpolates between RLVR and log-marginal-likelihood.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study utilizes the Tsallis q-logarithm to define a new loss family and introduces two Monte Carlo estimators: Gradient-Amplified RL (GARL) and Posterior-Attenuated Fine-Tuning (PAFT), to handle gradient amplification for cold-start scenarios.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; GARL at q=0.75 effectively mitigates cold-start stalling, outperforming other methods such as GRPO, especially in scenarios like FinQA, HotPotQA, and MuSiQue. In warm-start conditions, GARL and PAFT are analyzed for their respective effectiveness, with PAFT providing more stable gradients in certain setups.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.25907" target="_blank">https://huggingface.co/papers/2604.25907</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260506233617164.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>14. Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Chain of Evidence, Vision-Language Models, Iterative Retrieval-Augmented Generation, Visual Attribution, Bounding Boxes</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to bridge the gap in Iterative Retrieval-Augmented Generation systems by introducing the Chain of Evidence framework, which enhances precision in pixel-level evidence localization using Vision-Language Models over document screenshots.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study utilizes a retriever-agnostic visual attribution framework that leverages Vision-Language Models to process screenshots of document candidates, eliminating the need for format-specific parsing and enabling precise bounding box outputs.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The experiments on benchmarks such as Wiki-CoE and SlideVQA reveal that the fine-tuned Qwen3-VL-8B-Instruct model significantly outperforms traditional text-based baselines in understanding visual layouts, offering robust performance and establishing an interpretable solution for iRAG systems at the pixel level.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.01284" target="_blank">https://huggingface.co/papers/2605.01284</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260506233544371.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>15. Healthcare AI GYM for Medical Agents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Clinical reasoning, Reinforcement learning, Multi-turn agentic RL, Self-distillation, TT-OPD</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI in Healthcare</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Investigate the degradation of multi-turn clinical reasoning training into verbose single-turn interactions and propose a solution to improve training stability and performance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilized a gymnasium-compatible environment across 10 clinical domains with 3.6K+ tasks and 135 tools, exploring vanilla GRPO and proposing Turn-level Truncated On-Policy Distillation (TT-OPD) as a novel framework.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; TT-OPD significantly enhances performance, achieves the best results on 10 out of 18 benchmarks, offers faster convergence, controls response lengths, and sustains multi-turn tool use compared to non-RL baselines.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.02943" target="_blank">https://huggingface.co/papers/2605.02943</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260506233504606.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>16. TCDA: Thread-Constrained Discourse-Aware Modeling for Conversational Sentiment Quadruple Analysis</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Conversational Sentiment Analysis, Thread-Constrained Directed Acyclic Graph, Discourse-Aware Rotary Position Embedding, Temporal Sequence, Distance Dilution</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To address the limitations in Conversational Aspect-based Sentiment Quadruple Analysis (DiaASQ) by capturing complex interrelationships and effectively handling the dialogue structure and temporal sequences.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Proposed a novel framework combining Thread-Constrained Directed Acyclic Graph (TC-DAG) and Discourse-Aware Rotary Position Embedding (D-RoPE) to enhance the structure and sequence capturing in dialogues.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Demonstrated state-of-the-art performance on benchmark datasets by using the framework to alleviate structural noise, integrate temporal sequences, and resolve Distance Dilution issues.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.01717" target="_blank">https://huggingface.co/papers/2605.01717</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260506233430732.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>17. A Benchmark for Interactive World Models with a Unified Action Generation Framework</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: iWorld-Bench, world models, physical interaction capabilities, video datasets, AI-generated summary</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The primary aim is to develop iWorld-Bench, a comprehensive benchmark to evaluate the physical interaction capabilities of world models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Construction of a diverse video dataset with 330k clips, selection of 2.1k high-quality samples, and introduction of an Action Generation Framework with six unified task types generating 4.9k test samples.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Evaluated 14 representative world models, identifying key limitations and providing insights for future research, with the iWorld-Bench model leaderboard publicly available.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.03941" target="_blank">https://huggingface.co/papers/2605.03941</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260506233401160.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>18. SplAttN: Bridging 2D and 3D with Gaussian Soft Splatting and Attention for Point Cloud Completion</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Cross-Modal Entropy Collapse, Differentiable Gaussian Splatting, point cloud completion, continuous image-plane representation, gradient flow</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to address the issue of Cross-Modal Entropy Collapse during point cloud completion by proposing a novel method named SplAttN.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; SplAttN replaces hard projection with Differentiable Gaussian Splatting to maintain a dense, continuous image representation and enhance the learnability of cross-modal connections.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; SplAttN achieves state-of-the-art performance across multiple benchmarks, including PCN and ShapeNet-55/34, and demonstrates robust performance on the real-world KITTI benchmark, maintaining dependency on visual cues better than existing baselines.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.01466" target="_blank">https://huggingface.co/papers/2605.01466</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260506233329749.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>19. Reinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traces</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Reinforcement Learning, LLM Agents, Multi-Agent Systems, Orchestration Traces, Communication</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To optimize reinforcement learning for large language model-based multi-agent systems by focusing on task orchestration and coordination among agents.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilization of orchestration traces, which are temporal interaction graphs, to study events such as spawning, delegation, communication, and stopping within LLM-based agent teams.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Identified critical aspects of reward design and credit assignment in RL for LLM systems, highlighting areas like parallelism, correctness, and aggregation quality. The study also notes a lack of explicit RL training methods for the stopping decision within current academic and industrial evaluations.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.02801" target="_blank">https://huggingface.co/papers/2605.02801</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260506233252383.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>20. SymptomAI: Towards a Conversational AI Agent for Everyday Symptom Assessment</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Conversational AI Agents, Symptom Assessment, Differential Diagnosis, Wearable Health Data</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI in Healthcare</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To explore the performance of SymptomAI, a set of conversational AI agents, in conducting end-to-end patient interviews and differential diagnosis compared to clinicians.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; A large-scale randomized study involving 13,917 participants using the Fitbit app to interact with five AI agents, assessing performance via structured and user-guided interviews.</p>
<p>   &#8211; Analysis included 1,509 conversations from a general US population panel for broader validation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; SymptomAI&#8217;s diagnostic accuracy was significantly higher than clinicians when conducting structured symptom interviews.</p>
<p>   &#8211; The research demonstrated that dedicated symptom interviews eliciting additional information outperform baseline user-guided conversations.</p>
<p>   &#8211; Results indicate strong associations between acute infections and physiological shifts, validated across diverse populations beyond wearable device users.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.04012" target="_blank">https://huggingface.co/papers/2605.04012</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260506233208701.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>21. Video Generation with Predictive Latents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Predictive Learning, Video Generative Modeling, Latent Space, Temporal Coherence, Motion Priors</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study explores enhancing video generative modeling using a Predictive Video VAE model by unifying predictive learning with video reconstruction to improve latent space representation and generative performance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduced a predictive reconstruction objective that integrates predictive learning with video reconstruction by discarding future frames randomly and encoding partial past observations to reconstruct observed frames and predict future ones.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed Predictive Video VAE (PV-VAE) model achieves superior video generation performance, including 52% faster convergence and a 34.42 FVD improvement over existing models on UCF101, demonstrating improved scalability and effective temporal coherence capture in its latent space representation.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.02134" target="_blank">https://huggingface.co/papers/2605.02134</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260506233133138.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>22. X2SAM: Any Segmentation in Images and Videos</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: X2SAM, Multimodal Large Language Models, segmentation, video segmentation, conversational instructions</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce X2SAM, a unified segmentation model that extends multimodal segmentation capabilities from images to videos, supporting both conversational instructions and visual prompts.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Use a Mask Memory module for temporally consistent video mask generation and joint training strategy over heterogeneous datasets.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; X2SAM shows strong video segmentation performance, remains competitive on image segmentation benchmarks, and supports interactive, visual grounded segmentation across image and video inputs.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.00891" target="_blank">https://huggingface.co/papers/2605.00891</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260506233103907.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>23. OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: State-of-the-art, Supervised Fine-Tuning, Large Language Model, Academic-led Development, OpenSeeker-v2</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; This study presents a simplified approach to achieving state-of-the-art deep search capabilities in Large Language Model agents using minimal data.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The research utilizes a supervised fine-tuning strategy, enhanced by scaling knowledge graph size, expanding the tool set size, and implementing strict low-step filtering.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; OpenSeeker-v2, trained with only 10.6k data points, exceeds performance benchmarks compared to complex industrial pipelines, highlighting the potential of academic-led advancements in this field.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.04036" target="_blank">https://huggingface.co/papers/2605.04036</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260506233028858.png"></figure>
</p>
</div>
<div style='height:30px'></div>
]]></content:encoded>
					
		
		<enclosure url="https://cdn.ainative.foundation/huggingface/20260506233231560.mp4" length="0" type="video/mp4" />

			</item>
		<item>
		<title>AI Native Daily Paper Digest &#8211; 20260429</title>
		<link>https://ainativefoundation.org/ai-native-daily-paper-digest-20260429/</link>
		
		<dc:creator><![CDATA[insights]]></dc:creator>
		<pubDate>Thu, 30 Apr 2026 00:40:53 +0000</pubDate>
				<category><![CDATA[Papers]]></category>
		<guid isPermaLink="false">https://ainativefoundation.org/ai-native-daily-paper-digest-20260429/</guid>

					<description><![CDATA[1. Recursive Multi-Agent Systems 🔑 Keywords: RecursiveMAS, multi-agent systems, latent-space recursive computation, RecursiveLink module, gradient-based credit assignment 💡 Category: AI Systems and [&#8230;]]]></description>
										<content:encoded><![CDATA[<h3>1. Recursive Multi-Agent Systems</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: RecursiveMAS, multi-agent systems, latent-space recursive computation, RecursiveLink module, gradient-based credit assignment</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Explore if agent collaboration can be scaled through recursion using the RecursiveMAS framework, extending recursive scaling principles from single models to multi-agent systems.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Developed a recursive multi-agent framework with a RecursiveLink module connecting agents and employing an inner-outer loop learning algorithm for system co-optimization through shared gradient-based credit assignment.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; RecursiveMAS demonstrates superior efficiency and accuracy over traditional multi-agent systems, achieving an average accuracy improvement of 8.3%, 1.2 to 2.4 times faster inference speed, and a significant reduction in token usage by 34.6%-75.6% across multiple benchmarks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.25917" target="_blank">https://huggingface.co/papers/2604.25917</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260429233007640.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>2. DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Data Visualization, Cross-Platform Evolution, Intent Alignment, Native Environment</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To introduce DV-World, a comprehensive benchmark designed to evaluate data visualization agents across professional lifecycles in the real world.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Developed a benchmark in three domains: DV-Sheet, DV-Evolution, and DV-Interact, incorporating Table-value Alignment and MLLM-as-a-Judge for evaluation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; State-of-the-art models perform below 50% in handling real-world data visualization tasks, highlighting significant challenges.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.25914" target="_blank">https://huggingface.co/papers/2604.25914</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260429233036415.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>3. Meta-CoT: Enhancing Granularity and Generalization in Image Editing</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Meta-CoT, Image Editing, Chain-of-Thought, Decomposability, Generalizability</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To enhance image editing capabilities by decomposing editing operations into a task-target-understanding framework, improving granularity and generalization.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Proposes Meta-CoT, which breaks down single-image editing tasks into triplets and fundamental meta-tasks, incorporating a CoT-Editing Consistency Reward to align editing with Chain-of-Thought reasoning.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Meta-CoT achieves a 15.8% performance improvement across 21 editing tasks and demonstrates strong generalization to unseen tasks, with source code and benchmarks provided for public access.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.24625" target="_blank">https://huggingface.co/papers/2604.24625</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260429233101360.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>4. Mutual Forcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Mutual Forcing, autoregressive audio-video generation, joint audio-video modeling, self-distillation, training-inference consistency</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To efficiently generate audio-video content with long-horizon synchronization using a unified model that combines few-step and multi-step training modes.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Developed a framework called Mutual Forcing that integrates uni-modal generators into a combined model using a two-stage training strategy to optimize joint audio-video modeling and facilitate fast autoregressive generation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Mutual Forcing eliminates the necessity for an additional teacher model, reduces overhead, and enhances model training with real paired data. It delivers competitive or superior results compared to existing methods with significantly fewer steps, demonstrating improved efficiency and quality.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.25819" target="_blank">https://huggingface.co/papers/2604.25819</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260429233126664.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>5. Co-Director: Agentic Generative Video Storytelling</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AI-generated summary, hierarchical multi-agent framework, semantic coherence, global optimization problem, multimodal self-refinement</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to enhance video storytelling by addressing it as a global optimization problem using a hierarchical multi-agent framework to maintain semantic coherence.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The framework utilizes a multi-armed bandit approach to guide creative direction and employs multimodal self-refinement loops to prevent identity drift and ensure sequence-level consistency.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Co-Director outperforms existing methods, offering a robust solution capable of generalizing across broader cinematic narratives and successfully evaluated using the GenAD-Bench dataset.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.24842" target="_blank">https://huggingface.co/papers/2604.24842</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260429233151716.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>6. Toward Scalable Terminal Task Synthesis via Skill Graphs</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: SkillSynth, terminal task synthesis, scenario-mediated, skill graph, execution trajectories</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective of the study is to introduce SkillSynth, an automated framework aimed at enhancing the diversity and quality of execution trajectories for training terminal agents.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; SkillSynth constructs a scenario-mediated skill graph to generate a wide array of terminal task instances, using multi-agent harnesses to instantiate these into executable tasks. The framework samples workflow paths from the graph, controlling the diversity of trajectories explicitly.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Experiments using Terminal-Bench show that SkillSynth effectively enhances the diversity of execution trajectories. Task instances generated by SkillSynth have been used to train the Hy3 Preview, improving its capabilities in terminal-based environments.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.25727" target="_blank">https://huggingface.co/papers/2604.25727</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260429233224586.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>7. MAIC-UI: Making Interactive Courseware with Generative UI</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Zero-code, Interactive STEM courseware, Incremental generation, Pedagogical rigor, Multi-modal understanding</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI in Education</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective of the study is to introduce MAIC-UI, a zero-code system designed to empower educators to create and edit interactive STEM courseware rapidly and efficiently, thus overcoming traditional barriers such as the need for HTML/CSS/JavaScript expertise.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study employs a zero-code authoring system that utilizes structured knowledge analysis with multi-modal understanding, a generate-verify-optimize pipeline, and Click-to-Locate editing with Unified Diff-based incremental generation for rapid iteration cycles.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; MAIC-UI decreases editing iterations and enhances learnability and controllability compared to direct Text-to-HTML generation. Classroom deployments demonstrated significant improvements in learning outcomes, fostering learning agency and reducing outcome disparities in students.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.25806" target="_blank">https://huggingface.co/papers/2604.25806</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260429233254672.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>8. IAM: Identity-Aware Human Motion and Shape Joint Generation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: identity-aware motion generation, body morphology, motion dynamics, multimodal signals, joint motion-shape generation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Propose an identity-aware motion generation framework that models the relationship between body morphology and motion dynamics using multimodal signals.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilize multimodal signals such as natural language descriptions and visual cues for representing identity.</p>
<p>   &#8211; Introduce a joint motion-shape generation paradigm to synthesize motion sequences alongside body shape parameters.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The framework improves motion realism and motion-identity consistency while maintaining high motion quality as demonstrated on motion capture datasets and large-scale in-the-wild videos.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.25164" target="_blank">https://huggingface.co/papers/2604.25164</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260429233331632.mp4"></video> </figure>
</p>
</div>
<div style='height:30px'></div>
<h3>9. GoClick: Lightweight Element Grounding Model for Autonomous GUI Interaction</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: GoClick, GUI element grounding, vision-language model, encoder-decoder architecture, Progressive Data Refinement</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce GoClick, a lightweight model for GUI element grounding on mobile devices, focusing on high accuracy and low computational needs.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implement an encoder-decoder architecture to develop a model with only 230M parameters.</p>
<p>   &#8211; Utilize Progressive Data Refinement techniques for data optimization, including task type filtering and data ratio adjustment.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; GoClick achieves high visual grounding accuracy, comparable to larger models, while maintaining efficiency.</p>
<p>   &#8211; Enhances GUI agent performance in a device-cloud collaboration framework by improving element localization and success rates.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.23941" target="_blank">https://huggingface.co/papers/2604.23941</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260429233401692.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>10. The Last Harness You&#8217;ll Ever Build</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AI agents, automated harness engineering, evolutionary loops, meta-learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective is to automate the deployment of AI agents by using a two-level framework that optimizes task-specific harnesses through evolutionary loops and meta-learning protocols.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The framework consists of two levels: the first level optimizes a worker agent&#8217;s harness for a single task using the Harness Evolution Loop, while the second level, the Meta-Evolution Loop, optimizes the evolution protocol to enable quick harness adaptation across diverse tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed framework shifts the need for manual harness engineering to an automated process, potentially eliminating the need for human intervention in adapting agents to new task domains.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.21003" target="_blank">https://huggingface.co/papers/2604.21003</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260429233455996.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>11. Preferences of a Voice-First Nation: Large-Scale Pairwise Evaluation and Preference Analysis for TTS in Indian Languages</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: multilingual TTS, linguistic diversity, perceptual dimensions, SHAP analysis, Bradley-Terry modeling</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to develop a controlled multidimensional pairwise evaluation framework for multilingual Text to Speech (TTS) systems, with a focus on linguistic control and perceptual annotation across 10 Indic languages.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study involves evaluating 7 state-of-the-art TTS systems using over 5,000 native and code-mixed sentences, collecting more than 120,000 pairwise comparisons from 1,900 native raters. The evaluations are made across 6 perceptual dimensions including intelligibility, expressiveness, voice quality, liveliness, noise, and hallucinations, utilizing Bradley-Terry modeling and SHAP analysis.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The research constructs a multilingual leaderboard and interprets human preferences while analyzing the reliability of the leaderboard. It highlights the model strengths and trade-offs across different perceptual dimensions.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.21481" target="_blank">https://huggingface.co/papers/2604.21481</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260429233429564.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>12. </h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="" target="_blank"></a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/thirteen/202604291777505719.jpg"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>13. Offline Evaluation Measures of Fairness in Recommender Systems</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: recommender system fairness, fairness evaluation measures, AI Ethics and Fairness, robustness, guidelines</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Ethics and Fairness</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to address the limitations in the evaluation of fairness in recommender systems by analyzing theoretical flaws and developing novel approaches to improve the robustness and applicability of fairness evaluation measures.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Conduct theoretical and empirical analysis of existing fairness evaluation measures to expose their limitations.</p>
<p>   &#8211; Investigate a wide range of offline evaluation measures across different fairness notions, focusing on both users and items, and varying evaluation granularities.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Propose new evaluation approaches and measures that overcome existing limitations, thereby enhancing interpretability and applicability.</p>
<p>   &#8211; Provide guidelines for selecting appropriate fairness evaluation measures, facilitating more precise application in practical scenarios and thus advancing the state-of-the-art in the offline evaluation of fairness in recommender systems.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.25032" target="_blank">https://huggingface.co/papers/2604.25032</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260429233510558.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>14. Seeing Isn&#8217;t Believing: Uncovering Blind Spots in Evaluator Vision-Language Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Vision-Language Models, Evaluator VLMs, image-to-text, text-to-image, perturbations</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study systematically evaluates the reliability issues of current Evaluator VLMs in detecting various types of output errors.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduces targeted perturbations across key error dimensions and evaluates 4 prominent VLMs using over 4000 perturbed instances and multiple evaluation techniques.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Current VLM evaluators have substantial blind spots, particularly with fine-grained compositional and spatial errors, revealing their unreliable nature for benchmarking and urging caution in their development use. Code and data have been made publicly available.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.21523" target="_blank">https://huggingface.co/papers/2604.21523</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260429233443166.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>15. AutoGUI-v2: A Comprehensive Multi-Modal GUI Functionality Understanding Benchmark</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AutoGUI-v2, autonomous agents, Vision-Language Models, functionality understanding, interaction logic</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to evaluate autonomous agents&#8217; capability in understanding and predicting interactions within Graphical User Interfaces (GUIs) using the AutoGUI-v2 benchmark.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes a novel VLM-human collaborative pipeline for parsing multi-platform screenshots to create hierarchical functional regions and diverse evaluation tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The evaluation found that open-source Vision-Language Models fine-tuned on agent data excel in functional grounding, whereas commercial models perform better in functionality captioning. However, all models exhibit challenges with complex interaction logic, indicating that deep functional understanding remains crucial for advancing GUI agents.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.24441" target="_blank">https://huggingface.co/papers/2604.24441</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260429233414547.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>16. A Systematic Post-Train Framework for Video Generation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Supervised Fine-Tuning, Reinforcement Learning from Human Feedback, Group Relative Policy Optimization, temporal coherence, visual quality</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To bridge the gap between pretraining performance of video diffusion models and real-world deployment requirements by enhancing controllability, temporal coherence, and visual quality.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Four-stage post-training framework involving Supervised Fine-Tuning, Reinforcement Learning from Human Feedback with Group Relative Policy Optimization, Prompt Enhancement, and Inference Optimization.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed pipeline significantly mitigates common artifacts and enhances controllability, visual aesthetics, while maintaining efficient sampling costs, offering a practical blueprint for real-world deployment.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.25427" target="_blank">https://huggingface.co/papers/2604.25427</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260429233345523.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>17. V-GRPO: Online Reinforcement Learning for Denoising Generative Models Is Easier than You Think</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Variational GRPO, ELBO-based surrogates, generative models, human preferences, text-to-image synthesis</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The primary goal is to improve text-to-image synthesis by aligning generative models more efficiently with human preferences using the Variational GRPO method.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; This method combines ELBO-based surrogates with Group Relative Policy Optimization (GRPO), enhancing stability and efficiency in the alignment process.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Variational GRPO achieves state-of-the-art performance in text-to-image synthesis with significant speed improvements over previous methods like MixGRPO and DiffusionNFT.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.23380" target="_blank">https://huggingface.co/papers/2604.23380</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260429233315545.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>18. TCOD: Exploring Temporal Curriculum in On-Policy Distillation for Multi-turn Autonomous Agents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: On-policy distillation, Trajectory-Level KL Instability, Temporal Curriculum, multi-turn agent settings</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Investigate the instability and limitations of vanilla On-policy distillation (OPD) in multi-turn agent settings, particularly focusing on Trajectory-Level KL Instability.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduce TCOD (Temporal Curriculum On-Policy Distillation), employing a curriculum approach to progressively expand trajectory depth and enhance training stability.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; TCOD effectively mitigates KL escalation, increases KL stability, and significantly improves agent performance in multi-turn tasks compared to vanilla OPD, with improvements up to 18 points. It can also surpass the teacher&#8217;s performance and generalize to tasks where the teacher fails.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.24005" target="_blank">https://huggingface.co/papers/2604.24005</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260429233237568.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>19. BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: BARRED, custom guardrails, synthetic training data, multi-agent debate, dimension decomposition</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop a framework called BARRED for generating synthetic training data that enhances the performance of custom guardrail policies over existing language models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes dimension decomposition and multi-agent debate to generate diverse and high-fidelity synthetic data without extensive human annotation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The BARRED framework allows small language models to outperform state-of-the-art proprietary models by relying on synthetic data, highlighting the importance of dimension decomposition and debate-based verification for effective model fine-tuning.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.25203" target="_blank">https://huggingface.co/papers/2604.25203</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260429233203599.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>20. Step-Audio-R1.5 Technical Report</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Audio Language Models, Reinforcement Learning, Human Feedback, Chain-of-Thought, Immersive Dialogue</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Investigate the limitations of current reinforcement learning paradigms in audio language models, specifically addressing the &#8220;verifiable reward trap&#8221; and its impact on conversational quality.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Employ Reinforcement Learning from Human Feedback (RLHF) to refine audio reasoning capabilities, introducing Step-Audio-R1.5 to enhance interactive experiences.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Step-Audio-R1.5 effectively maintains analytical reasoning while transforming interaction quality, bridging the gap between mechanical verification and sensory empathy for immersive long-turn dialogues.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.25719" target="_blank">https://huggingface.co/papers/2604.25719</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260429233139996.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>21. Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Refinement via Regeneration, Unified multimodal models, text-to-image, semantic alignment, conditional image regeneration</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To improve multi-modal model refinement by transitioning from editing-based approaches to conditional image regeneration, leading to better semantic alignment in text-to-image tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; A novel framework called Refinement via Regeneration is proposed that refines images by regenerating them based on conditional inputs, avoiding traditional editing methods.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Demonstrated significant improvement in evaluation metrics such as Geneval, DPGBench, and UniGenBench++, proving the efficacy of the RvR approach.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.25636" target="_blank">https://huggingface.co/papers/2604.25636</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260429233114796.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>22. AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AI agents, AutoResearchBench, Deep Research, Wide Research, autonomous scientific research</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To present AutoResearchBench, a benchmark designed to evaluate AI agents&#8217; capability in autonomous scientific literature discovery.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes two task types: Deep Research, involving multi-step probing, and Wide Research, which requires comprehensive paper collection.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; AutoResearchBench sets a high difficulty benchmark, showing that powerful LLMs achieve low accuracy rates (9.39% in Deep Research and 9.31% in Wide Research) compared to previous benchmarks.</p>
<p>   &#8211; Dataset, evaluation pipeline, and code are publicly released to encourage further research.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.25256" target="_blank">https://huggingface.co/papers/2604.25256</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260429233048585.png"></figure>
</p>
</div>
<div style='height:30px'></div>
<h3>23. Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Programming with Data, structured knowledge representation, language models, domain-specific capabilities, data repair</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Foundations of AI</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To create a principled framework for systematically transferring human expertise into large language models using structured knowledge representation and systematic feedback.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Training data is treated as source code, enabling unit testing and debugging to address model failures identified as concept-level gaps and reasoning-chain breaks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Demonstrates that the training data and model behavior relationship is traceable and repairable. This approach provides consistent improvements across different model scales and architectures without degrading general capabilities.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2604.24819" target="_blank">https://huggingface.co/papers/2604.24819</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260429233022863.png"></figure>
</p>
</div>
<div style='height:30px'></div>
]]></content:encoded>
					
		
		<enclosure url="https://cdn.ainative.foundation/huggingface/20260429233331632.mp4" length="0" type="video/mp4" />

			</item>
	</channel>
</rss>
