<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>AI Native Foundation</title>
	<atom:link href="https://ainativefoundation.org/feed/" rel="self" type="application/rss+xml" />
	<link>https://ainativefoundation.org</link>
	<description></description>
	<lastBuildDate>Fri, 12 Jun 2026 00:42:00 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=7.0</generator>

<image>
	<url>https://ainativefoundation.org/wp-content/uploads/2024/05/cropped-favicon-32x32.png</url>
	<title>AI Native Foundation</title>
	<link>https://ainativefoundation.org</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>AI Native Daily Paper Digest &#8211; 20260611</title>
		<link>https://ainativefoundation.org/ai-native-daily-paper-digest-20260611/</link>
		
		<dc:creator><![CDATA[insights]]></dc:creator>
		<pubDate>Fri, 12 Jun 2026 00:42:00 +0000</pubDate>
				<category><![CDATA[Papers]]></category>
		<guid isPermaLink="false">https://ainativefoundation.org/ai-native-daily-paper-digest-20260611/</guid>

					<description><![CDATA[1. Redesign Mixture-of-Experts Routers with Manifold Power Iteration 🔑 Keywords: Mixture-of-Experts, Router, Principal Singular Direction, Manifold Power Iteration 💡 Category: Machine Learning [&#8230;]]]></description>
										<content:encoded><![CDATA[<h3>1. Redesign Mixture-of-Experts Routers with Manifold Power Iteration</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Mixture-of-Experts, Router, Principal Singular Direction, Manifold Power Iteration</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To propose a novel router redesign for Mixture-of-Experts models by aligning router rows with the principal singular directions of expert matrices to enhance model effectiveness.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilization of a &#8220;Power-then-Retract&#8221; paradigm within Manifold Power Iteration, applying a power iteration step on router weights followed by retraction to impose norm constraints, ensuring efficiency and stability.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Empirical evidence from pretraining MoE models with parameters ranging from 1B to 11B confirms that the proposed alignment improves the effectiveness of Mixture-of-Experts models.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.12397" target="_blank">https://huggingface.co/papers/2606.12397</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img fetchpriority="high" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233005979.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>2. Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Large Language Models, Agentic Environments, Neural Synthesis, Symbolic Synthesis, Environment Engineering Lifecycle</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To systematically analyze the environments for Large Language Model agents in terms of their engineering lifecycle stages and capabilities evolution.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Comprehensive study of environments through eight attributes and domains, along with the introduction of symbolic and neural synthesis paradigms.</p>
<p>   &#8211; Evaluation of environment evolution via neural-driven, difficulty-driven, and scaling-driven approaches.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Identifies key pathways for agent evolution using four perspectives: memory, orchestration, trajectory, and exploration.</p>
<p>   &#8211; Discusses future directions, including Environment-as-a-Service and Multi-agent Environments.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.12191" target="_blank">https://huggingface.co/papers/2606.12191</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233038704.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>3. Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: teacher-student framework, reward models, visual preference, Z-Reward, text-to-image optimization</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to improve preference accuracy and optimization performance in text-to-image training by decoupling complex reasoning from efficient reward deployment through a teacher-student framework.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduces Z-Reward, a framework where the teacher, a large VLM, uses reasoning to infer score distributions and is trained with Group-wise Direct Score Optimization. The student employs Reasoning-Internalized Score Distillation to encode the teacher&#8217;s score distribution.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The Z-Reward framework demonstrates significant improvements in human preference accuracy in internally annotated evaluations, with both the teacher and student models outperforming existing baselines. Additionally, it provides a differentiable reward signal leading to substantial enhancements in text-to-image optimization.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.09076" target="_blank">https://huggingface.co/papers/2606.09076</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233109245.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>4. Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Spatial reasoning, egocentric videos, Geometry-to-Video pipeline, MLLM, 3D geometry</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper proposes a training-free framework named &#8220;Reason, then Re-reason (ReRe)&#8221; for improving spatial reasoning from egocentric videos by allowing revisitation of conclusions through synthesized novel-view videos generated from predicted 3D geometry. </p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The ReRe framework operates in two phases: the Reason Phase and the Re-reason Phase, utilizing MLLM to first form a spatial hypothesis from original videos and then verify or revise it through observing synthesized novel-view videos. A Geometry-to-Video pipeline is utilized for rendering complementary novel views from 3D geometry, providing elevated, oblique perspectives.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Extensive evaluations on VSI-Bench and STI-Bench show that the ReRe framework significantly enhances the performance of open-source MLLMs, making them competitive with proprietary state-of-the-art methods.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.11683" target="_blank">https://huggingface.co/papers/2606.11683</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233133496.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>5. World Pilot: Steering Vision-Language-Action Models with World-Action Priors</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: World Pilot, Vision-Language-Action models, World-Action Model, zero-shot out-of-distribution</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Enhance Vision-Language-Action models with dynamic scene evolution and trajectory priors from a World-Action Model to improve performance in zero-shot out-of-distribution manipulation tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduce World Pilot framework that integrates dynamic priors through Latent Steering and Action Steering pathways to augment policy for VLA models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; World Pilot achieves a state-of-the-art total success rate of 84.7% on the LIBERO-Plus zero-shot OOD benchmark and excels across real-robot settings, demonstrating high success rates even under varying conditions.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.12403" target="_blank">https://huggingface.co/papers/2606.12403</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233159579.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>6. ComBench: A Benchmark for Rigorous Proof Reasoning and Constructive Realization in Olympiad-Level Combinatorics</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: ComBench, Olympiad-level, combinatorial reasoning, large language models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Knowledge Representation and Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study introduces a new benchmark named ComBench to evaluate the combinatorial reasoning capabilities of large language models using Olympiad-level problems.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; ComBench includes 100 human-annotated problems divided into analysis-centric and construction-centric settings, using rubric-guided proof grading and deterministic construction verification for evaluation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The study found that even the strongest models are not fully adept at tackling Olympiad-level combinatorial problems, with a top model scoring 65.4% on average. It highlights the distinction between rigorous proof reasoning and constructive realization as separate capabilities of the models.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.10479" target="_blank">https://huggingface.co/papers/2606.10479</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233229018.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>7. InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Multimodal Contextual Reasoning, Multimodal Multi-head Latent Attention, Video Agent, Long-Horizon Tasks</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to enhance long-horizon multimodal tasks through Multimodal Contextual Reasoning and efficient attention mechanisms, particularly focusing on video understanding challenges.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The research employs a novel framework named InternVideo3, which utilizes Multimodal Contextual Reasoning and introduces Multimodal Multi-head Latent Attention to improve efficiency in video task processing. The training process involves staged training with components like continued pretraining, short-to-long supervised fine-tuning, rule-based reinforcement learning, and on-policy distillation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; InternVideo3 exhibits strong performance on video understanding benchmarks such as Video-MME, MLVU, and EgoSchema. It also demonstrates robust and evidence-grounded behavior as a video agent, suggesting that efficient context handling and closed-loop reasoning are critical for long-horizon visually grounded agency.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.12195" target="_blank">https://huggingface.co/papers/2606.12195</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233251828.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>8. TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: TRACE, rollout allocation, reward contrast, multi-turn, tree-structured rollouts</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Improve reward contrast in multi-turn agentic reinforcement learning through dynamic resource distribution across tree-structured rollouts.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Model each ReAct-style thought-action-observation turn as a distinct node for enhanced budget allocation.</p>
<p>   &#8211; Use a shared generalizable predictor to estimate conditional success probability for guiding allocation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; TRACE framework enriches outcome-only feedback and amplifies the policy-update signal.</p>
<p>   &#8211; Achieved improvements in performance and efficiency, notably improving Qwen3-14B Multi-Hop QA accuracy by 2.8 points with competitive baselines at equal sampling cost.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.11119" target="_blank">https://huggingface.co/papers/2606.11119</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233318402.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>9. ICA Lens: Interpreting Language Models Without Training Another Dictionary</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Independent Component Analysis, Interpretable Directions, Language Model Representations, Sparse Autoencoders, ICA Lens</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to assess the efficacy of Independent Component Analysis (ICA) for identifying interpretable directions in language model representations as a faster alternative to sparse autoencoder training.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; ICA Lens, a novel workflow combining optimized GPU-parallel FastICA pipeline with LLM-specific stability recipes and diagnostics, was introduced to analyze LLM activations efficiently.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; ICA Lens demonstrates competitive performance to sparse autoencoders in probing tasks and performs better in targeted probe perturbations under smaller budgets, suggesting that it should be considered a robust option for language model interpretability.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.11722" target="_blank">https://huggingface.co/papers/2606.11722</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233340996.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>10. Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Embodied Foundation Model, Embodied Reasoning, Multi-task Balanced Reinforcement Learning, Zero-shot Real-robot Experiments, AI Native </p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper introduces Embodied-R1.5, a unified Embodied Foundation Model aimed at enhancing embodied reasoning capabilities and achieving state-of-the-art performance on embodied vision-language benchmarks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The model employs a multi-task balanced reinforcement learning approach and integrates a Planner-Grounder-Corrector (PGC) closed-loop framework to autonomously execute and self-correct long-horizon tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Embodied-R1.5 achieves state-of-the-art results in 16 out of 24 benchmarks, surpassing leading models like Gemini-Robotics-ER-1.5, and demonstrates robust generalization to the physical world through extensive zero-shot real-robot experiments. The model&#8217;s components, including weights and datasets, have been open-sourced to aid future research in Embodied Foundation Models.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.11324" target="_blank">https://huggingface.co/papers/2606.11324</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260611233404666.mp4"></video> </figure>
</p>
</div>
<div style="height:30px"></div>
<h3>11. World Model Self-Distillation: Training World Models to Solve General Tasks</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Self-distillation, Reinforcement Learning, Video Diffusion Model, Vision-Language Model, Task Solving</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop a scalable framework that leverages self-distillation and reinforcement learning to transfer task-solving abilities from vision-language models to video diffusion models without requiring labeled task-video data.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Combining self-distillation with reinforcement learning to elicit task-solving ability of pretrained video generators, supported by vision-language model-generated candidate tasks and step-by-step solutions.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The Executor, enhanced through reinforcement learning from VLM feedback, surpasses the performance of the Demonstrator in task-solving capabilities, especially when evaluated through VLM-based protocol and benchmark tests such as WorldTasks-Benchmark and DreamGen robotics benchmark.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.12072" target="_blank">https://huggingface.co/papers/2606.12072</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233432582.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>12. ReVision: Scaling Computer-Use Agents via Temporal Visual Redundancy Reduction</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: ReVision, Computer-use agents, visual tokens, patch selector, multimodal language models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to enhance efficiency in computer-use agents by reducing visual token usage through the removal of redundant visual patches in consecutive screenshots, while maintaining essential spatial information.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The method involves a learned patch selector used within a framework named ReVision, which is applied to train multimodal language models. The selector compares patch representations across screenshots to identify and remove redundant patches.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Implementing ReVision results in a significant reduction of token usage by 46% on average, while improving the success rate by 3% across benchmarks such as OSWorld, WebTailBench, and AgentNetBench, thus demonstrating enhanced efficiency in processing longer interaction trajectories with fewer tokens.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.11212" target="_blank">https://huggingface.co/papers/2605.11212</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233520742.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>13. i1: A Simple and Fully Open Recipe for Strong Text-to-Image Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: text-to-image diffusion, diffusion models, open models, i1 model</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; This study aims to investigate the design choices in text-to-image diffusion models and develop i1, a new 3B-parameter model that maintains transparency and matches leading performance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Conducted over 300 controlled experiments consuming 700K+ TPU v6e hours to analyze and identify effective modeling and data design choices for text-to-image diffusion models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The i1 model is created utilizing publicly available datasets and achieves competitive results across five representative benchmarks, outperforming existing fully open models by an average of 29.5 percentage points. The study offers the i1 model checkpoints, training and inference code, and a data processing pipeline to facilitate open research.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.11289" target="_blank">https://huggingface.co/papers/2606.11289</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233454369.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>14. Time-Series Foundation Model Embeddings for Remaining Useful Life Estimation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: RUL prediction, industrial sensor data, TSFM, Chronos-2, frozen pretrained model</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To improve Remaining Useful Life (RUL) prediction performance using a lightweight approach by integrating a frozen pretrained time-series foundation model with a simple regression head.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilized a frozen pretrained time-series foundation model, TSFM, specifically Chronos-2, as the backbone for feature extraction from multivariate sensor data.</p>
<p>   &#8211; Implemented a lightweight regression neural network to estimate RUL.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The approach demonstrated superior RUL prediction performance compared to recurrent, convolutional, Transformer-based, and gradient-boosting methods on real-world industrial sensor data.</p>
<p>   &#8211; Longer historical data significantly enhances prediction performance, suggesting TSFM&#8217;s practical and efficient application in industrial RUL estimation.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.11990" target="_blank">https://huggingface.co/papers/2606.11990</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233545993.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>15. Large Language Models Are Overconfident in Their Own Responses</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Instruction tuning, Chat template, Calibration, Ownership bias, Confidence elicitation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To investigate the calibration issue in instruction-tuned large language models, particularly focusing on the role of chat templates and ownership bias.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Decoupling the effects of post-training algorithms and chat template formats to study their impact on calibration.</p>
<p>   &#8211; Conducting extensive experiments across six recent open-weight LLMs, three benchmarks, and three methods of confidence elicitation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Instruction tuning harms the calibration of LLMs, with chat templates exacerbating overconfidence via ownership bias.</p>
<p>   &#8211; Proposing a novel inference-time strategy to frame the model&#8217;s answer as user input, improving calibration by up to 26% without retraining.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.03437" target="_blank">https://huggingface.co/papers/2606.03437</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233610026.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>16. Which Models Are Our Models Built On? Auditing Invisible Dependencies in Modern LLMs</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: ModSleuth, LLM Dependency Graphs, Source-Grounded Evidence, License Obligations, Train-Evaluation Coupling</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop an agentic system (ModSleuth) that reconstructs large-scale dependency graphs for LLM development by analyzing public artifacts.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Formalization distinguishing between direct and indirect dependencies.</p>
<p>   &#8211; Operation-centered relationships to represent heterogeneous pipeline roles.</p>
<p>   &#8211; Resolving artifact identities across names, versions, and repositories.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; ModSleuth successfully recovered 1,060 source-verified dependencies to construct large-scale dependency graphs for LLMs.</p>
<p>   &#8211; Revealed multi-hop license obligations, train-evaluation coupling, and documentation inconsistencies in modern LLM development.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.12385" target="_blank">https://huggingface.co/papers/2606.12385</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233700016.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>17. Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Continual Instruction Tuning, Large Language Models, low-resource languages, Kupang Malay, bilingual dictionary</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To enhance the translation performance of large language models on low-resource languages like Kupang Malay by using a new fine-tuning approach.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Employing a training paradigm called Continual Instruction Tuning (CIT) that utilizes instruction-based training with a bilingual dictionary.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed model, Lius, significantly improves translation accuracy over standard models and surpasses traditional translation systems on various evaluation metrics, reducing the need for large-scale parallel data.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.11786" target="_blank">https://huggingface.co/papers/2606.11786</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233633750.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>18. Towards Diverse Scientific Hypothesis Search with Large Language Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Evolutionary framework, Large language models, Hypothesis generation, Diversity, Quality</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop an evolutionary framework that enhances diversity and quality in generating scientific hypotheses using multi-temperature sampling and information exchange.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Employs an evolutionary framework inspired by parallel tempering that explores multiple temperature levels for hypothesis generation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The approach consistently improves the quality and diversity of hypotheses across various domains, operating efficiently under a fixed validation budget while maintaining robustness against expensive downstream computational validations.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.10587" target="_blank">https://huggingface.co/papers/2606.10587</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233722713.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>19. FlowLet: Conditional 3D Brain MRI Synthesis using Wavelet Flow Matching</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Brain Age Prediction, FlowLet, generative data augmentation, flow matching, invertible 3D wavelet domain</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To improve brain age prediction performance for underrepresented age groups by synthesizing age-conditioned 3D MRIs using a novel framework FlowLet.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilized FlowLet, a conditional generative framework that employs flow matching within an invertible 3D wavelet domain to avoid reconstruction artifacts and reduce computational demands.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; FlowLet effectively generates high-fidelity MRI volumes with minimal sampling steps and enhances brain age prediction models by providing diverse data, increasing performance in underrepresented age groups while preserving anatomical structures.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2601.05212" target="_blank">https://huggingface.co/papers/2601.05212</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233750344.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>20. Can Generalist Agents Automate Data Curation?</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Automated data curation, generalist coding agents, Curation-Bench, method-guided exploration, data-selection policy</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Explore whether generalist coding agents can automate the data-curation loop in modern AI development.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduction of Curation-Bench, an agent-centric benchmark allowing command-line access for implementing and submitting data policies to a fixed training/evaluation pipeline.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Current agents can autonomously compose data-selection policies that outperform strong baselines, but scaffolded method adaptation is required for reliable data research.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.04261" target="_blank">https://huggingface.co/papers/2606.04261</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233814678.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>21. </h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="" target="_blank"></a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/thirteen/202606111781221114.jpg"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>22. Distilling LLM Feedback for Lean Theorem Proving</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Feedback Distillation, self-distillation, token-level supervision, language model, GRPO</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Knowledge Representation and Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To improve post-training techniques for reasoning models by introducing Feedback Distillation combined with GRPO.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Feedback Distillation employs token-level supervision and integrates privileged feedback from language models for enhanced training diversity and trajectory generation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Feedback Distillation maintains greater diversity and complements GRPO, with superior performance when GRPO is initialized from a Feedback Distillation checkpoint.</p>
<p>   &#8211; This approach shows promise for enhancing post-training in complex reasoning tasks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.30861" target="_blank">https://huggingface.co/papers/2605.30861</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233825140.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>23. SparDA: Sparse Decoupled Attention for Efficient Long-Context LLM Inference</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: SparDA, Sparse attention, KV cache, Forecast projection, AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To enhance long-context LLM inference by addressing KV cache bottlenecks and attention complexity using a novel decoupled sparse attention architecture called SparDA.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduces SparDA architecture with an additional Forecast projection for lookahead selection, reducing selection overhead with one Forecast head per GQA group.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; SparDA maintains or improves accuracy, achieving up to 1.25 times prefill speedup and 1.7 times decode speedup over the baseline, enabling larger batch sizes on a single GPU with up to 5.3 times higher decode throughput.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.04511" target="_blank">https://huggingface.co/papers/2606.04511</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233801044.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>24. τ-Rec: A Verifiable Benchmark for Agentic Recommender Systems</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Agentic Recommender Systems, Conversational Interfaces, Verifiable Rewards, Reliability Metric, Qwen/Qwen2.5-Coder-32B-Instruct</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study introduces a benchmark, τ-Rec, aimed at evaluating the reliability of agentic recommender systems using verifiable rewards and controlled dialogue constraints.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The research employs τ-Rec as a benchmark tool to test agents against structured catalog predicates and utilizes a pass^k reliability metric to systematically assess reasoning consistency.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Findings reveal significant reliability challenges, with the best model achieving only ~57% at pass^1 and ~38% at pass^4, demonstrating a critical gap in current conversational agent deployment.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.10156" target="_blank">https://huggingface.co/papers/2606.10156</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233732950.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>25. Building Social World Models with Large Language Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Social World Model, temporal pattern mining, social belief dynamics, prediction markets, state-of-the-art results</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Foundations of AI</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study introduces the Social World Model (SWM), a framework designed to capture the evolution of social beliefs in response to major events.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; SWM employs temporal pattern mining and evidence lower bound optimization to learn state-transition functions for social beliefs without requiring explicit human annotations.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; SWM significantly outperforms time-series foundation models, achieving state-of-the-art performance on Kalshi data and demonstrating competitive results on Polymarket data.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.11482" target="_blank">https://huggingface.co/papers/2606.11482</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233712625.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>26. APEX: A Network-Native Time-Series Foundation Model for Forecasting and Anomaly Detection for Wireless Edge Operations</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Network-native, APEX, Wireless Network Telemetry, Decoder-Only Transformer, Privacy-Preserving Inference</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To present APEX, a network-native, decoder-only transformer model for forecasting enterprise access point (AP) telemetry and evaluate its performance on DHCP degradation tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Pre-training of APEX on 10-channel multivariate telemetry from approximately 4,500 production wireless networks, with evaluation against existing foundation models and traditional methods.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; APEX-Large outperforms existing models by reducing MAE by 18% over the best foundation-model baseline and 38% over SARIMA on a 192-step DHCP degradation benchmark. APEX-Edge offers sub-second, privacy-preserving inference, suggesting the practicality of network-native pre-training for proactive wireless operations.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.11553" target="_blank">https://huggingface.co/papers/2606.11553</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233647978.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>27. Adaptive Multi-Resolution Procedural Knowledge Compression for Large Language Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Large language models, procedural knowledge, skill compression</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper introduces SKIM, an adaptive multi-resolution soft token compression framework, aiming to efficiently compress procedural skills while maintaining task performance in Large language model (LLM) applications.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; SKIM adapts to various skill complexities by creating different numbers of soft tokens to improve the efficiency of LLM inference and preserve the effectiveness of skill usage.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; SKIM effectively reduces the token length of procedural skills to 30-60% of their original size while preserving task performance better than existing methods. The authors have made their code available publicly.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.12203" target="_blank">https://huggingface.co/papers/2606.12203</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233622754.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>28. DRIFT: A Residual Flow Adapter for Decoding Continuous Outputs in Vision-Language Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: DRIFT, vision-language models, continuous decoding, flow matching, robotic control</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper introduces DRIFT, a framework aimed at improving pretrained vision-language models for continuous decoding tasks by integrating coarse prediction with iterative refinement through flow matching.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; DRIFT employs a base predictor coupled with a generative refinement module using flow matching to iteratively enhance predictions, transforming the generative modeling problem to localize around a strong prior.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; DRIFT consistently outperforms existing regression- and generative-based solutions in perception and planning tasks, such as visual grounding and robotic control, across multiple architectures.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.05758" target="_blank">https://huggingface.co/papers/2606.05758</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260611233556880.mp4"></video> </figure>
</p>
</div>
<div style="height:30px"></div>
<h3>29. Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: ART, Parameter-Efficient Fine-Tuning, LoRA, Computational Graphs, Multimodal Large Language Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to enable parameter-efficient fine-tuning of frozen multimodal language models by optimizing raw visual inputs through gradient backpropagation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduced ART (Art-based Reinforcement Training) for injecting information into a frozen Multimodal Large Language Model by optimizing its raw visual input without altering precompiled computational graphs.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; ART demonstrates performance comparable to LoRA, achieving competitive accuracy across various benchmarks, especially in mathematics and structured-tool-use scenarios.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.11854" target="_blank">https://huggingface.co/papers/2606.11854</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233533461.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>30. POISE: Position-Aware Undetectable Skill Injection on LLM Agents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: POISE, Skill-Poisoning Attack, LLM Scanners, Attack Success Rate, Context-Aware Generator</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective of this research is to introduce and evaluate POISE, a stealthy skill-poisoning attack that embeds malicious triggers within benign-looking instructions to maintain high attack success rates while avoiding detection.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The research used position-aware skill-poisoning attacks blended with context-aware generation to evaluate the detection success rates, leveraging established frameworks like Skill-Inject with codex+gpt-5.2.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; POISE achieved an 89.3% Attack Success Rate, outperforming both random body placement and YAML-only baselines, while retaining stealth advantages. Its design minimizes detection by LLM scanners, which tend to mistakenly flag benign skills.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.07943" target="_blank">https://huggingface.co/papers/2606.07943</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233507100.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>31. Breaking the Bubble: Asynchronous Pipeline Parallel Training with Bounded Weight Inconsistency</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: PACI, asynchronous pipeline, gradient accumulation, training throughput, stability</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To enhance the efficiency of asynchronous pipeline training for large neural networks by controlling weight inconsistency, improving throughput and training time-to-accuracy without compromising stability.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduced PACI, a method using local gradient accumulation as a version-control mechanism to manage forward/backward weight inconsistency in pipelines without needing weight stashing or global synchronization.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; PACI achieves similar stability and memory usage as synchronous pipelines while significantly increasing training throughput and reducing time-to-accuracy by up to 1.69 times compared to traditional methods, demonstrating that carefully controlled inconsistency can enhance training efficiency.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.07881" target="_blank">https://huggingface.co/papers/2606.07881</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233442941.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>32. Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Recursive Automated Composition, Reinforcement Learning, Verifiable Environments, Large Language Models, Recursive Composition</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to enhance the reasoning capabilities of Large Language Models through a framework that enables scalable reinforcement learning by compositing verifiable environments.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduction of RACES, a framework utilizing compositional operators to automatically combine 300 verifiable environments, focusing on recursive composition methods.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; RACES improves reasoning generalization, achieving better performance metrics for models such as DeepSeek-R1-Distill-Qwen-14B and Qwen3-14B on six benchmarks while maintaining efficiency in environment utilization.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.12373" target="_blank">https://huggingface.co/papers/2606.12373</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233419749.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>33. EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: EvoTrainer, Autonomous LLM Training, Empirical Feedback, Reusable Skills, Diagnostics</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To demonstrate the superior performance of EvoTrainer in evolving language model policies and training harnesses autonomously through empirical feedback, surpassing traditional handcrafted methods.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; EvoTrainer autonomously co-evolves LLM policies and training harnesses by diagnosing rollout-level evidence, revising diagnostics, backtesting interventions, and accumulating reusable skills.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; EvoTrainer matches or exceeds human-engineered RL references, especially in long-horizon agentic software engineering tasks, by preventing invalid high-scoring branches and shaping later strategies through reusable skills.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.03108" target="_blank">https://huggingface.co/papers/2606.03108</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233352095.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>34. Reroute, Don&#8217;t Remove: Recoverable Visual Token Routing for Vision-Language Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Vision-language models, visual tokens, token reduction, grounding-sensitive queries, Reroute</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To improve grounding performance in vision-language models by employing recoverable routing instead of irreversible visual-token pruning.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduced a training-free plug-in called Reroute, where selected vision tokens bypass certain stages only to re-enter later, using existing attention-score ranking rules to enhance performance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Reroute improves grounding performance under aggressive token reduction while maintaining general VQA performance, suggesting a shift in VLM token reduction perspective from irreversible pruning to recoverable routing.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.12412" target="_blank">https://huggingface.co/papers/2606.12412</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233328935.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>35. Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Reinforcement Learning, Multi-Token Prediction, Model Entropy, Probabilistic Rejection Sampling, TV Loss</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to address efficiency bottlenecks in reinforcement learning training for large language models by optimizing multi-token prediction techniques.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The authors conducted a systematic study on Multi-Token Prediction (MTP) in post-training of large language models, applying entropy-aware sampling and novel training objectives to improve acceptance rates and inference throughput.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed novel end-to-end TV loss optimizes the multi-step rejection sampling acceptance rate, achieving significant improvements in acceptance rates and inference throughput across various tasks. Experimental results demonstrate the methodology achieves up to 1.8x acceleration in async RL training.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.12370" target="_blank">https://huggingface.co/papers/2606.12370</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233304056.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>36. Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Grammar-Constrained Decoding, CodeSpear, CodeShield, Large Language Models, Jailbreak Attack</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to uncover the risks associated with Grammar-Constrained Decoding (GCD) in code generation by exposing its potential as an attack surface and proposes a solution to mitigate these risks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study reveals a new jailbreak attack called CodeSpear that exploits GCD to induce Large Language Models into generating malicious code. It further introduces CodeShield, a safety alignment approach to counteract this vulnerability effectively.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Experiments demonstrate that CodeSpear significantly increases the attack success rate, revealing inherent risks in GCD. Conversely, CodeShield is shown to restore safety while maintaining benign functionality, emphasizing the need for attention to GCD&#8217;s security implications.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.11817" target="_blank">https://huggingface.co/papers/2606.11817</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233240620.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>37. On Subquadratic Architectures: From Applications to Principles</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: xLSTM, Sequence Modeling, Memory Dynamics, State Tracking, Gating Scheme</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To compare the efficiency and effectiveness of three sequence modeling approaches: xLSTM, Mamba-2, and Gated DeltaNet, particularly focusing on tasks with complex dependencies.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The models were evaluated across multiple scenarios: code-model pre-training, code model distillation from large language models, and pre-training of time-series foundation models, with additional analysis on synthetic length-generalization tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; xLSTM outperforms Mamba-2 and Gated DeltaNet due to its superior state tracking and memory dynamics, providing more flexible and stable memory correction via its gating scheme.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.12364" target="_blank">https://huggingface.co/papers/2606.12364</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233218401.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>38. DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: DeNovoSWE, whole-repository generation, sandboxed agentic workflow, divide and conquer, Qwen3-30B-A3B</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To introduce DeNovoSWE, a large-scale dataset designed for training code agents to generate entire software repositories from documentation, aiming to enhance long-horizon software engineering tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The dataset is created using a sandboxed agentic workflow for automated construction, employing a &#8220;divide and conquer&#8221; approach combined with a critic-repair philosophy and incorporating a difficulty-aware trajectory filtering strategy for quality assurance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Fine-tuning the Qwen3-30B-A3B model on DeNovoSWE significantly boosts its performance on complex software engineering benchmarks, notably raising the score from 5.8% to 47.2% on the BeyondSWE-Doc2Repo benchmark.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.10728" target="_blank">https://huggingface.co/papers/2606.10728</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233147018.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>39. TRL-Bench: Standardizing Cross-Paradigm Representation-Level Evaluation of Tabular Encoders</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: TRL-Bench, Tabular Representation Learning, Tabular Encoders, End-to-End Pipelines, Data-Lake Table Enrichment</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to establish a standardized benchmark called TRL-Bench that evaluates tabular representation learning models across different granularities and task types.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The methodology involves exporting row, column, or table embeddings and testing them with lightweight probes across TRL-CTbench, TRL-Rbench, and TRL-DLTE for comprehensive assessment in a cross-paradigm context.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; TRL-Bench reveals that tabular encoder performance is capability-specific and cannot be solely evaluated through single leaderboard rankings, emphasizing the need for compatibility in downstream task conditions.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.09323" target="_blank">https://huggingface.co/papers/2606.09323</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233120506.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>40. Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Claw-SWE-Bench, OpenClaw, adapter protocol, coding agents, SWE-bench</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study introduces Claw-SWE-Bench, a new benchmark and adapter protocol, to enable fair comparison and enhance performance evaluation of diverse coding agents, particularly emphasizing the significance of adapter design.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The protocol standardizes evaluation conditions with a focus on fair settings, utilizing a comprehensive 350 GitHub issue instance benchmark across 8 languages and introducing a lite version for expedited validation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Results demonstrate that the adapter design is critical; with the right adapter, coding agents like OpenClaw can significantly boost performance from 19.1% to 73.4% Pass@1 score, stressing the importance of harness choice and cost considerations in the evaluation.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.12344" target="_blank">https://huggingface.co/papers/2606.12344</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260611233054765.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>41. Toward Generalist Autonomous Research via Hypothesis-Tree Refinement</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Autonomous scientific research, AI framework, Hypothesis Tree Refinement, Iterative experimentation, Autonomous Optimization</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To explore how an AI agent can autonomously facilitate scientific research by coordinating and executing hypothesis tests while maintaining a knowledge tree that refines research over time.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduction of Arbor, an AI framework that consists of a long-lived coordinator and short-lived executors to manage and test hypotheses, respectively, employing Hypothesis Tree Refinement for strategic research evolution.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Arbor demonstrates significant improvements in six research tasks across model training, harness engineering, and data synthesis, achieving superior results compared to Codex and Claude Code with notable gains in efficiency and outcome quality.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.11926" target="_blank">https://huggingface.co/papers/2606.11926</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260611233022489.mp4"></video> </figure>
</p>
</div>
<div style="height:30px"></div>
]]></content:encoded>
					
		
		<enclosure url="https://cdn.ainative.foundation/huggingface/20260611233404666.mp4" length="0" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/huggingface/20260611233556880.mp4" length="0" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/huggingface/20260611233022489.mp4" length="0" type="video/mp4" />

			</item>
		<item>
		<title>Global AI Native Industry Insights &#8211; 20260611 &#8211;  Anthropic &#124; Google &#124; Cursor &#124; more</title>
		<link>https://ainativefoundation.org/global-ai-native-industry-insights-20260611-anthropic-google-cursor-more/</link>
		
		<dc:creator><![CDATA[AINF]]></dc:creator>
		<pubDate>Thu, 11 Jun 2026 04:06:49 +0000</pubDate>
				<category><![CDATA[Global Industry]]></category>
		<guid isPermaLink="false">https://ainativefoundation.org/global-ai-native-industry-insights-20260611-anthropic-google-google-more/</guid>

					<description><![CDATA[Anthropic's Claude Fable 5, Google DiffusionGemma, Gemini 3.5, Cursor Bugbot.]]></description>
										<content:encoded><![CDATA[<p>Anthropic&#8217;s Claude Fable 5, Google DiffusionGemma, Gemini 3.5, Cursor Bugbot. Discover more in Today’s Global AI Native Industry Insights.</p>
<h3>1.  Anthropic releases Claude Fable 5 and Claude Mythos 5 with enhanced capabilities and safeguards</h3>
<p>Anthropic announced Claude Fable 5, the first publicly available Mythos-class model featuring state-of-the-art performance in software engineering, knowledge work, and vision tasks. The model includes built-in safeguards that block responses in high-risk areas like cybersecurity and biology, instead routing to Claude Opus 4.8 for safety. Concurrently, Anthropic launched Claude Mythos 5 for trusted partners in the Glasswing program, providing unrestricted access to advanced capabilities for cybersecurity professionals. Both models represent a significant jump in capability over previous versions and are priced at $10 per million input tokens and $50 per million output tokens.</p>
<p>Read more: <a href="https://www.anthropic.com/news/claude-fable-5-mythos-5">https://www.anthropic.com/news/claude-fable-5-mythos-5</a></p>
<p><video width="600" height="400" controls poster="https://cdn.ainative.foundation/20260611_en_claude.png"><source src="https://cdn.ainative.foundation/20260611_en_claude.mp4" type="video/mp4"></video></p>
<p>Video Credit: Claude Youtube Channel</p>
<h3>2.  Google releases DiffusionGemma experimental open model with 4x faster text generation using diffusion-based parallel processing</h3>
<p>Google released DiffusionGemma, an experimental open model under Apache 2.0 license that uses text diffusion to generate entire blocks of text simultaneously rather than token-by-token. The 26B Mixture of Experts model activates only 3.8B parameters during inference and delivers up to 4x faster text generation on dedicated GPUs, achieving speeds of 1000+ tokens per second on NVIDIA H100 and 700+ tokens per second on RTX 5090. The model is designed for researchers and developers working on latency-sensitive workflows including code infilling, inline editing, and non-linear text generation. Google notes that DiffusionGemma prioritizes speed over quality and performs below standard Gemma 4 on benchmarks.</p>
<p>Read more: <a href="https://blog.google/innovation-and-ai/technology/developers-tools/diffusion-gemma-faster-text-generation/">https://blog.google/innovation-and-ai/technology/developers-tools/diffusion-gemma-faster-text-generation/</a></p>
<p><video width="600" height="400" controls poster="https://cdn.ainative.foundation/image/20260611_e880ffb004c94373a49edcdc40084116.jpg"><source src="https://cdn.ainative.foundation/video/20260611_dfe26df33d434535b1cf33e15bf82234.mp4" type="video/mp4"></video></p>
<p>Video Credit: @googlegemma on X</p>
<h3>3.  Google releases Gemini 3.5 Live Translate for real-time speech translation in 70+ languages</h3>
<p>Google announced the release of Gemini 3.5 Live Translate, a new audio model that provides live speech-to-speech translation in over 70 languages. The model generates continuous translated speech while preserving speaker intonation, pacing, and pitch, staying just seconds behind the speaker throughout sessions. The technology is rolling out to Google Translate on Android and iOS, Google Meet with expanded language support, and is available to developers through the Gemini Live API and Google AI Studio. All generated audio includes SynthID watermarks for AI content detection.</p>
<p>Read more: <a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-live-3-5-translate/">https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-live-3-5-translate/</a></p>
<p><video width="600" height="400" controls poster="https://cdn.ainative.foundation/image/20260611_a832a2d9b40344dfb1d37acfdae34311.jpg"><source src="https://cdn.ainative.foundation/video/20260611_1b6722bcb38845f0ad75c4e875633725.mp4" type="video/mp4"></video></p>
<p>Video Credit: @GoogleAI on X</p>
<h3>4.  Cursor announces Bugbot code review agent is 3x faster, 22% cheaper, and finds 10% more bugs</h3>
<p>Cursor released major improvements to its Bugbot code review agent, making it over 3x faster to run, 22% cheaper, and capable of finding 10% more bugs per review. The company reports that 90% of Bugbot runs now finish in under three minutes. Users can also access a new /review command to run Bugbot locally to catch and fix issues before pushing code to repositories.</p>
<p>Read more: <a href="https://cursor.com/blog/bugbot-updates-june-2026">https://cursor.com/blog/bugbot-updates-june-2026</a></p>
<p><video width="600" height="400" controls poster="https://cdn.ainative.foundation/image/20260611_3b662f7723364698927efe7b18a33bda.jpg"><source src="https://cdn.ainative.foundation/video/20260611_30bcc53f9a82418b946c1900fab279e0.mp4" type="video/mp4"></video></p>
<p>Video Credit: @cursor_ai on X</p>
<div style="width:100%;height:2px;background:#808080;margin:10px 0"></div>
<p>That’s all for today’s Global AI Native Industry Insights. Join us at <a href="https://member.ainativefoundation.org/">AI Native Foundation Membership Dashboard</a> for the latest insights on AI Native, or follow our linkedin account at <a href="https://www.linkedin.com/company/ainativefoundation/">AI Native Foundation</a> and our twitter account at <a href="https://x.com/AINativeF">AINativeF</a>.</p>
]]></content:encoded>
					
		
		<enclosure url="https://cdn.ainative.foundation/20260611_en_claude.mp4" length="467002" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/video/20260611_dfe26df33d434535b1cf33e15bf82234.mp4" length="281509" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/video/20260611_1b6722bcb38845f0ad75c4e875633725.mp4" length="2544353" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/video/20260611_30bcc53f9a82418b946c1900fab279e0.mp4" length="961029" type="video/mp4" />

			</item>
		<item>
		<title>China AI Native Industry Insights &#8211; 20260610 &#8211;  Huawei &#124; Xiaomi MiMo &#124; Kimi &#124; more</title>
		<link>https://ainativefoundation.org/china-ai-native-industry-insights-20260610-huawei-xiaomi-mimo-kimi-more/</link>
		
		<dc:creator><![CDATA[AINF]]></dc:creator>
		<pubDate>Wed, 10 Jun 2026 08:46:57 +0000</pubDate>
				<category><![CDATA[China Industry]]></category>
		<guid isPermaLink="false">https://ainativefoundation.org/china-ai-native-industry-insights-20260610-huawei-xiaomi-mimo-kimi-more/</guid>

					<description><![CDATA[Explore Huawei's AI, Xiaomi's speed, Kimi Code's upgrade.]]></description>
										<content:encoded><![CDATA[<p>Explore Huawei&#8217;s AI, Xiaomi&#8217;s speed, Kimi Code&#8217;s upgrade. Discover more in Today’s China AI Native Industry Insights.</p>
<h3>1.  Huawei Cloud launches Agentic Infra paradigm and AI products at INSPIRE 2026 conference</h3>
<p>Huawei Cloud unveiled Agentic Infra, a new infrastructure paradigm for general and AI workloads at its INSPIRE 2026 conference in Shanghai. The company released four core products including AI Cluster Service supporting over 100,000 cards with 200 EFLOPS computing power, Agentic Memory Storage with petabyte-scale memory space, ModelArtsNext training platform, and AgentArts enterprise agent platform. The new infrastructure features efficient token generation, continuous learning, unified compute scheduling, and secure autonomy for enterprise AI deployment.</p>
<p>Read more: <a href="https://www.huawei.com/en/news/2026/6/inspire-agenticera-agenticinfra">https://www.huawei.com/en/news/2026/6/inspire-agenticera-agenticinfra</a></p>
<p><video width="600" height="400" controls poster="https://www-file.huawei.com/dam/asset/view/20260608092949465004136616897771.jpg"><source src="https://cdn.ainative.foundation/video/0260610_cn_huawei.mp4" type="video/mp4"></video></p>
<p>Video Credit: NotebookLM</p>
<h3>2.  Xiaomi MiMo-V2.5-Pro-UltraSpeed achieves breakthrough of 1000 tokens per second inference speed for 1T parameter model</h3>
<p>Xiaomi and TileRT jointly announced the MiMo-V2.5-Pro-UltraSpeed mode, marking the first time a trillion-parameter model has exceeded 1000 tokens per second in output speed. The breakthrough was achieved through co-design optimization including FP4 quantization for MoE experts and DFlash speculative decoding with block-level masked parallel prediction, running on standard 8-GPU nodes. The UltraSpeed API is now available through an application-based limited release from June 9 to June 23, 2026, priced at 3x the standard MiMo-V2.5-Pro rate while delivering approximately 10x faster output. Over 3000 enterprises and developers from various industries including law, finance, logistics, and automotive manufacturing applied for trial access within 12 hours of the announcement.</p>
<p>Read more: <a href="https://mp.weixin.qq.com/s/EZvmrx8xfM9MZNCMDwImFQ">https://mp.weixin.qq.com/s/EZvmrx8xfM9MZNCMDwImFQ</a></p>
<p><video width="600" height="400" controls poster="https://cdn.ainative.foundation/video/20260610_cn_xiaomi_img.png"><source src="https://cdn.ainative.foundation/video/20260610_cn_xiaomi.mp4" type="video/mp4"></video></p>
<p>Video Credit: The original article</p>
<h3>3.  Kimi Code releases major upgrade with one-command installation, video understanding, and ACP protocol support</h3>
<p>Moonshot AI announced a major version upgrade for Kimi Code, its open-source Coding Agent product. The update introduces one-command installation with millisecond-level startup, video understanding capabilities including color grading extraction and video editing, and integration with authoritative data sources such as TongHuaShun and Tianyancha for real-time stock prices and financial reports. The upgrade also adds support for the ACP protocol, enabling use within JetBrains and Zed editors, and includes a comprehensive hook ecosystem for integration with other tools. Users can access the updated version at kimi.com/code.</p>
<p>Read more: <a href="https://mp.weixin.qq.com/s/sqGuRyeU9AvZZ3IXj8ddJg">https://mp.weixin.qq.com/s/sqGuRyeU9AvZZ3IXj8ddJg</a></p>
<p><video width="600" height="400" controls poster="https://cdn.ainative.foundation/video/202606%600_cn_kimicode_img.png"><source src="https://cdn.ainative.foundation/video/20260610_cn_kimicode.mp4" type="video/mp4"></video></p>
<p>Video Credit: The original article</p>
<div style="width:100%;height:2px;background:#808080;margin:10px 0"></div>
<p>That’s all for today’s China AI Native Industry Insights. Join us at <a href="https://member.ainativefoundation.org/">AI Native Foundation Membership Dashboard</a> for the latest insights on AI Native, or follow our linkedin account at <a href="https://www.linkedin.com/company/ainativefoundation/">AI Native Foundation</a> and our twitter account at <a href="https://x.com/AINativeF">AINativeF</a>.</p>
]]></content:encoded>
					
		
		<enclosure url="https://cdn.ainative.foundation/video/0260610_cn_huawei.mp4" length="14830953" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/video/20260610_cn_xiaomi.mp4" length="5416790" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/video/20260610_cn_kimicode.mp4" length="2506713" type="video/mp4" />

			</item>
		<item>
		<title>AI Native Daily Paper Digest &#8211; 20260609</title>
		<link>https://ainativefoundation.org/ai-native-daily-paper-digest-20260609/</link>
		
		<dc:creator><![CDATA[insights]]></dc:creator>
		<pubDate>Wed, 10 Jun 2026 00:41:46 +0000</pubDate>
				<category><![CDATA[Papers]]></category>
		<guid isPermaLink="false">https://ainativefoundation.org/ai-native-daily-paper-digest-20260609/</guid>

					<description><![CDATA[1. SWE-Explore: Benchmarking How Coding Agents Explore Repositories 🔑 Keywords: SWE-Explore, coding agents, repository exploration, line budget, agentic exploration 💡 Category: AI [&#8230;]]]></description>
										<content:encoded><![CDATA[<h3>1. SWE-Explore: Benchmarking How Coding Agents Explore Repositories</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: SWE-Explore, coding agents, repository exploration, line budget, agentic exploration</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; SWE-Explore introduces a benchmark to assess the repository exploration capabilities of coding agents, focusing on ranked lists of code within line budgets to surpass traditional retrieval methods.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study evaluates 848 issues across 10 programming languages and 203 repositories, emphasizing metrics like coverage, ranking, and context-efficiency, derived from agent trajectories.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Agentic explorers demonstrate superior performance compared to classical retrieval methods, notably in line-level coverage and efficient ranking, which differentiate state-of-the-art explorers.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.07297" target="_blank">https://huggingface.co/papers/2606.07297</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233006016.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>2. Latent Spatial Memory for Video World Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Video world models, Latent spatial memory, Diffusion latent space, End-to-end video generation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce latent spatial memory for video world models to eliminate pixel-space reconstruction overhead.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implement a framework called Mirage that constructs 3D memory directly in diffusion latent space through depth-guided back-projection and novel view synthesis.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed method achieves up to 10.57 times faster video generation and reduces memory footprint by 55 times compared to traditional methods, achieving state-of-the-art performance on benchmarks like WorldScore and RealEstate10K.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.09828" target="_blank">https://huggingface.co/papers/2606.09828</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260609233056967.mp4"></video> </figure>
</p>
</div>
<div style="height:30px"></div>
<h3>3. Agents&#8217; Last Exam</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AI agents, real-world tasks, industry clusters, task taxonomy, living benchmark</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper introduces Agents&#8217; Last Exam (ALE), a benchmark tailored to evaluate AI agents on long-term, economically valuable real-world tasks that cover 13 industry clusters and over 1,000 tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; ALE was developed in collaboration with over 250 industry experts, structured around a task taxonomy with 55 subfields, and designed to continuously grow with new workflows and industries.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Current evaluation results show a significant gap between AI benchmark performance and real-world deployment, with an average full pass rate of only 2.6%. ALE is intended to bridge this gap by offering a more practical measure of AI impact on GDP-relevant tasks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.05405" target="_blank">https://huggingface.co/papers/2606.05405</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233031746.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>4. FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Lookahead Sparse Attention, Neural Memory Indexer, KV Cache, FlashMemory, Dual-Encoder Architecture</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to address the GPU memory bottleneck caused by conventional LLMs during decoding by introducing Lookahead Sparse Attention (LSA) empowered by a Neural Memory Indexer.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The LSA technique involves proactive management of the KV cache by predicting future context demands.</p>
<p>   &#8211; Utilizes a decoupled training strategy with a standard dual-encoder architecture, trained independently using standard retrieval training frameworks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed LSA approach significantly reduces GPU memory usage for long-context tasks, compressing the average physical KV cache footprint to 13.5% compared to the full-context baseline.</p>
<p>   &#8211; Maintains or slightly improves downstream accuracy, achieving a +0.6% margin on average and reducing KV cache overhead by over 90% at extreme scales without impacting reasoning capabilities.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.09079" target="_blank">https://huggingface.co/papers/2606.09079</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233133659.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>5. Human Psychometric Questionnaires Mischaracterize LLM Behavior</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: LLM behavior, real-world interactions, generation-based profiling, psychometric questionnaires, generation probabilities</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To determine the reliability of human psychometric questionnaires in predicting LLM behavior during everyday user interactions.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Comparison of LLM value and personality profiles derived from Likert self-reports and generation probabilities over value-laden responses to user queries.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Psychometric questionnaires are insufficient for predicting LLM behavior as they fail to replicate realistic user query responses, while generation-based profiling provides a more accurate understanding.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2509.10078" target="_blank">https://huggingface.co/papers/2509.10078</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233157754.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>6. End-to-End Context Compression at Scale</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Latent Context Language Models, Compression Ratios, Encoder-decoder Compression, Architecture Search, Pre-training</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to enhance encoder-decoder compression techniques through architectural search and extensive pre-training to develop Latent Context Language Models (LCLMs) that efficiently manage long contexts with improved performance and memory usage compared to traditional KV cache methods.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The authors conducted an architecture search and pre-trained various encoder-decoder model variants from scratch to determine optimal design and training strategies. They introduced LCLMs with pre-trained encoder-decoder models at different compression ratios of 1:4, 1:8, and 1:16 on over 350B tokens.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The LCLMs effectively enhance the Pareto frontier of general-task performance, compression speed, and memory usage. They serve as efficient backbones for long-horizon agents, facilitating skim through compressed contexts and adaptive expansion of relevant segments when needed.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.09659" target="_blank">https://huggingface.co/papers/2606.09659</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233225957.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>7. A Geometric Account of Activation Steering through Angle-Norm Decomposition</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: hidden-state norm, angular structure, spherical steering, activation steering, language models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to challenge the notion that hidden-state norms carry concept-relevant information in language models and to understand how concepts are represented through angular structures and norms.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Conducted a controlled empirical study to explore the roles of angular and radial components by comparing different steering methods in language models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Findings indicate concepts are mainly represented through angular structures, yet the hidden-state norm is crucial for stability and effectiveness in steering methods. The study suggests that activation steering should be parameterized by angular and radial components for better interpretability and performance.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06735" target="_blank">https://huggingface.co/papers/2606.06735</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233254997.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>8. SwiftVR: Real-Time One-Step Generative Video Restoration</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: real-time video restoration, consumer GPUs, efficient attention mechanisms, lightweight autoencoding, causal chunk-wise protocol</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To enable real-time video restoration on consumer GPUs achieving high frame rates at 4K resolution through efficient attention mechanisms and lightweight autoencoding.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilization of mask-free shifted-window self-attention for efficient spatial window processing and lightweight restoration-aware autoencoding for fast, quality-preserving chunk-wise decoding.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; SwiftVR sustains significant frame rates on high-resolution settings and is the first generative VR model enabling real-time 1080p streaming on consumer-grade GPUs, ensuring strong no-reference perceptual quality with low inference cost.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.09516" target="_blank">https://huggingface.co/papers/2606.09516</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233319488.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>9. Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Hallucinations, Whisper ASR, Sparse AutoEncoder, Activation-space steering, SAE latent-space steering</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing </p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to detect and reduce hallucinations in Whisper ASR using internal representations from audio encoder activations and Sparse AutoEncoder latents.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The research involves extracting audio encoder activations and evaluating two representation spaces: raw Whisper activations and Sparse AutoEncoder latents. Two strategies are proposed for steering: activation-space steering and SAE latent-space steering.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The results show a remarkable reduction in hallucination rates, from 72.63% to 14.11% for Whisper small and from 86.88% to 27.33% for Whisper large-v3, with minimal transcription degradation, demonstrating the effectiveness of the proposed strategies. </p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.07473" target="_blank">https://huggingface.co/papers/2606.07473</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233345236.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>10. DEI: Diversity in Evolutionary Inference for Quality-Diversity Search</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Heterogeneous Large Language Models, Evolutionary Inference, Quality-Diversity, Mutation Operators, Cross-Model Adversarial Pressure</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To showcase how using heterogeneous large language models as mutation operators in a distributed Quality-Diversity search framework can enhance evolutionary inference.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implemented DEI, a distributed Quality-Diversity framework that utilizes heterogeneous LLMs for mutation operations, extending the Digital Red Queen framework for cross-model competition and robustness.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; DEI&#8217;s use of model diversity significantly improves performance over homogeneous setups, demonstrated by higher QD-Scores and coverage in the Core War domain. This highlights the importance of model diversity as opposed to mere parallelism in distributed LLM-based QD search.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.27130" target="_blank">https://huggingface.co/papers/2605.27130</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233410018.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>11. SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: SlimSearcher, Pareto-efficient trajectory filtering, Adaptive Reward Shaping, Reinforcement Learning, computational efficiency</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper introduces SlimSearcher, a framework designed to improve the efficiency of deep research agents by balancing the trade-off between computational costs and accuracy.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The framework employs Pareto-efficient trajectory filtering during Supervised Fine-Tuning and Adaptive Reward Shaping during Reinforcement Learning to enhance efficiency and accuracy.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Experiments on benchmarks such as GAIA, BrowseComp, and XBenchDeepSearch show that SlimSearcher can reduce tool-call rounds by 17%-58% while maintaining or enhancing accuracy.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.07074" target="_blank">https://huggingface.co/papers/2606.07074</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233437284.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>12. Liberating LLM Capabilities in Full-Duplex Speech Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: tri-channel speech interface, full-duplex interaction, Listen-Write-Speak, text-first paradigm, autoregressive LLM</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Human-AI Interaction</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper proposes a text-first tri-channel speech interface that emphasizes the importance of visible text output alongside spoken responses for real-time and structured conversational tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduces the Listen-Write-Speak (LWS) paradigm using an autoregressive LLM to handle audio, text, and speech concurrently with a shared causal attention context, leveraging a Token Schema without architectural changes.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The study demonstrates that visible writing can effectively serve as a first-class output channel for speech interaction, maintaining high responsiveness and performance across multiple benchmarks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.07547" target="_blank">https://huggingface.co/papers/2606.07547</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233502212.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>13. Light-WAM: Efficient World Action Models with State-Fusion Action Decoding</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Light-WAM, World Action Models, robot manipulation, video backbone, StateFusionActionExpert</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The main goal is to develop a lightweight World Action Model called Light-WAM for efficient robot manipulation that incorporates future-video supervision to enhance temporal structure representation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Use a compact video backbone and downsampled latent space to reduce video co-training costs.</p>
<p>   &#8211; Implement the StateFusionActionExpert to directly predict action chunks with learned-query pooling from multiple backbone layers.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Light-WAM demonstrates strong performance on LIBERO and achieves functional multi-task performance on RoboTwin 2.0 with only 0.44B trainable parameters.</p>
<p>   &#8211; Achieves low inference latency of 72.03ms with 4.1GiB peak GPU memory usage and improved training throughput.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.08242" target="_blank">https://huggingface.co/papers/2606.08242</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233527242.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>14. SDR: Set-Distance Rewards for Radiology Report Generation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: set-based rewards, embedding distances, chest X-ray report generation, vision&#8211;language models, reinforcement learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI in Healthcare</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To improve chest X-ray report generation by employing set-based rewards using embedding distances that facilitate effective post-training and test-time selection without the need for causal reasoning structures.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Used a set-based approach where reports are split into sentences and transformed into unordered embedding sets.</p>
<p>   &#8211; Proposed set-to-set distances between generated and reference embeddings as continuous, permutation-invariant rewards.</p>
<p>   &#8211; Conducted experiments across two datasets and three vision-language models, comparing post-training set-to-set distance based rewards against supervised fine-tuning.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Set-to-set distance based rewards consistently outperform supervised fine-tuning on all headline metrics with notable improvements in BERTScore, RadGraph F1, and CheXbert F1 scores.</p>
<p>   &#8211; The approach facilitates test-time best-of-N selection, providing a significant performance improvement over random selection.</p>
<p>   &#8211; Set-distance rewards enable more efficient test-time scaling, reducing generated tokens while maintaining quality, thus establishing them as a unified signal for both post-training and test-time scaling in chest X-ray report generation.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.00440" target="_blank">https://huggingface.co/papers/2606.00440</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233651500.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>15. Trajectory-Refined Distillation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: On-policy distillation, Prefix failure, Trajectory-refined distillation, Large language models, Teacher guidance</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to address prefix failure in on-policy distillation (OPD) for large language models by proposing trajectory-level corrections.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The paper introduces Trajectory-Refined Distillation (TRD), a method that corrects student&#8217;s rollouts at the trajectory level under teacher guidance to mitigate prefix failure.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; TRD successfully improves exploration by exposing students to alternative valid derivations, enhances single-attempt accuracy, and broadens reasoning coverage, outperforming prior baselines across various benchmarks and scales.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.08432" target="_blank">https://huggingface.co/papers/2606.08432</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233624814.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>16. DuMate-DeepResearch: An Auditable Multi-Agent System with Recursive Search and Rubric-Grounded Reasoning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Deep Research, Multi-agent Framework, Long-form Synthesis, Planning, Tool Ecosystem</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The aim is to create a multi-agent framework called DuMate-DeepResearch to address the challenges of deep research tasks, improving planning, evidence acquisition, and report synthesis.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The framework decouples task understanding, planning, and scheduling from evidence acquisition and report rendering, making decisions traceable using a dynamic optimization approach.</p>
<p>   &#8211; Introduces dynamic graph-based planning, recursive two-level execution, and rubric-based test-time optimization mechanisms.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; DuMate-DeepResearch achieved state-of-the-art results on two benchmarks, marking top scores in both overall performance and specific metrics such as information recall and analysis.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.07299" target="_blank">https://huggingface.co/papers/2606.07299</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233554388.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>17. Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: EvalCards, AI evaluation, interpretive signals, benchmark metadata, score comparability</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop EvalCards, a framework that standardizes and unifies AI evaluation reporting across various platforms to overcome inconsistencies and facilitate reliable comparisons.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; A structured review of 52 papers and 10 stakeholder interviews to derive a reporting schema.</p>
<p>   &#8211; Implementation of four key interpretive signals: reproducibility, documentation completeness, provenance and risk, and score comparability.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; A new operational reporting layer was created, deploying a monitoring tool that applied to thousands of models, benchmarks, and results, exposing systematic gaps in current AI evaluation reporting practices.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.09809" target="_blank">https://huggingface.co/papers/2606.09809</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233716118.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>18. WorldCraft: From Camera Navigation to Object Manipulation in Interactive Video World Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: WorldCraft, video-based world models, object-level trajectory actions, camera navigation, trajectory-centric control</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce WorldCraft, a framework that extends video-based world models to include object-level trajectory control while maintaining camera navigation functionalities.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes a trajectory-centric control pipeline with components like Normalized World Trajectory, Spatial-Pathway LoRA, and Trajectory-Anchored State Persistence to achieve simultaneous object manipulation and camera navigation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; WorldCraft successfully enables precise object control, preserves camera navigation fidelity, and maintains object state across extended scenarios, even during off-camera excursions.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.25077" target="_blank">https://huggingface.co/papers/2605.25077</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233829386.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>19. AsyncWebRL: Efficient Multi-Step RL for Visual Web Agents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AsyncWebRL, asynchronous reinforcement learning, trajectory normalization</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; This research aims to enhance vision-language web agent training by employing asynchronous reinforcement learning and modifying trajectory normalization to achieve faster throughput and better performance on challenging tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study introduces AsyncWebRL, which combines an asynchronous design with specific adaptations like an everlasting rollout pool and lightweight screenshot handling, resulting in a significant speedup in training throughput.</p>
<p>   &#8211; Implementing a modification in trajectory normalization by replacing 1/|τ_i| with a constant 1/k, improving trajectory shortening while maintaining success rates.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The AsyncWebRL approach sets a new open-source state-of-the-art performance on the WebGym out-of-distribution test split, with notable performance improvements on more difficult tasks, achieving up to +48% relative gain on the hardest slice.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.05597" target="_blank">https://huggingface.co/papers/2606.05597</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260609233753796.mp4"></video> </figure>
</p>
</div>
<div style="height:30px"></div>
<h3>20. Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: LLM agents, exploit-resistant verifiers, reward hacking, hacker-fixer loop, Terminal Wrench</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to identify vulnerabilities in agent benchmark verification systems and develop an automated iterative process using LLM agents to create robust verifiers that resist exploitation while maintaining legitimate task performance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; An audit was conducted on 1,968 tasks across five terminal-agent benchmarks to assess hackability by frontier models.</p>
<p>   &#8211; The introduction of the hacker-fixer loop, which uses three LLM agents iteratively: a hacker, a fixer, and a solver to build exploit-resistant verifiers without per-task manual patching.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The hacker-fixer loop significantly reduced attack success rates; for example, it brought the attack success rate from 62% to 0% on KernelBench.</p>
<p>   &#8211; Weaker agents in the loop were effective against more powerful attackers, underscoring the loop&#8217;s robustness in identifying and mitigating exploits.</p>
<p>   &#8211; Terminal Wrench was released as a snapshot of the current attack surface and a basis for future research, including patched verifiers and discovered exploits.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.08960" target="_blank">https://huggingface.co/papers/2606.08960</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233855638.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>21. Where Rectified Flows Leak: Characterising Membership Signals Along the Interpolation Path</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Rectified Flows, Membership Inference Attack, training data traces, interpolation path, bell-shaped curve</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To understand what generative models retain from training data and its implications for privacy and copyright.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Analysis of the interpolation path in Rectified Flows to identify differences in the reconstruction of training and test data, and derivation of a maximum point under Gaussian assumptions.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The study suggests that Rectified Flows encode subtle traces of training data exploitable for membership inference attacks, with a universal bell-shaped structure identified in the data reconstruction curve.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.07271" target="_blank">https://huggingface.co/papers/2606.07271</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609234015770.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>22. Chiaroscuro Attention: Spending Compute in the Dark</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: CHIAR-Former, spectral entropy, self-attention, DCT, attention FLOPs</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to enhance transformer efficiency on large text datasets by dynamically routing tokens using spectral entropy to select optimal operators.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; A 4-layer hybrid transformer, CHIAR-Former, is proposed, which routes tokens among DCT spectral mixing, RBF kernel mixing, and full self-attention, with evaluations conducted on datasets like WikiText-103 and IMDB sentiment classification.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The CHIAR-Former demonstrated a 45% improvement in performance over traditional full-attention models on WikiText-103 with reduced computational resources, indicating the advantages of spectral routing in large-scale text processing.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.08327" target="_blank">https://huggingface.co/papers/2606.08327</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233949924.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>23. OASIS: From Simulation Data Collection to Real-World Humanoid Loco-Manipulation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: simulation-data-driven framework, humanoid loco-manipulation, 3D generative model, hierarchical visuomotor policy, domain randomization</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to enhance humanoid robot loco-manipulation by utilizing a simulation-data-driven framework named OASIS, leveraging simulation to overcome limitations in traditional robot manipulation task demonstrations.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The authors employ a 3D generative model to create realistic assets and collect trajectories through teleoperation in simulation, which are further augmented with domain randomization. They design a hierarchical visuomotor policy based on this augmented simulation data.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed framework, OASIS, demonstrates that policies trained on simulated data achieve better zero-shot performance compared to those trained on real-robot teleoperation data, ensuring higher success rates on various tasks by capturing broader environmental variations.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.08548" target="_blank">https://huggingface.co/papers/2606.08548</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260609233920723.mp4"></video> </figure>
</p>
</div>
<div style="height:30px"></div>
<h3>24. Precision Is Not Faithfulness: Coverage-Aware Evaluation of Grounded Generation with a Complete Oracle</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Reference-free faithfulness, precision, recall, grounded generation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to address the limitation of reference-free faithfulness metrics that only measure precision, proposing a new metric that combines precision and recall to offer a more comprehensive evaluation of generated content&#8217;s faithfulness.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The researchers utilize Formula 1 telemetry as a deterministic domain to measure both precision and recall by having access to complete ground truth. They conduct experiments on a multilingual benchmark and a second complete-oracle domain, NOAA weather forecasts, to validate their metric.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The study finds that high-precision models often have poor fact coverage, thus ranking lower when evaluated with both precision and recall. A new verifier-guided generation method is proposed, improving precision and recall without needing references, demonstrating the effectiveness of their proposed metric.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.09376" target="_blank">https://huggingface.co/papers/2606.09376</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609234134588.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>25. Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Large Language Models, Lean4, Formal Verification, Multi-step Workflows, Agent Behavior</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The primary aim is to enhance the reliability and performance of multi-step workflows in Large Language Models (LLMs) using a formal verification framework with Lean4, a dependent-type formal language.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Development of Lean4 Agent, which utilizes Lean4 to model and verify agent behavior, with the FormalAgentLib library to ensure semantic consistency and debug workflow execution, and LeanEvolve to enhance workflows utilizing results from FormalAgentLib.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Achieved an 11.94% improvement on verification-passing workflows over failing ones and enhanced SWE performance by 7.47% using LeanEvolve. Stablished a foundational framework for formal modeling and verification of agent behavior with dependent-type formal languages.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06523" target="_blank">https://huggingface.co/papers/2606.06523</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609234106773.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>26. Cosine Misleads: Auxiliary Losses Reshape Vision Language Models, Not Their Latents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Latent Visual Reasoning, Supervised Latent Tokens, Cosine Similarity, Information Bottleneck, Vision-Language Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Challenge conventional views on the relationship between latent mechanisms and accuracy in vision-language models (VLMs), specifically focusing on the correlation between cosine alignment of supervised latents and model accuracy.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Experimentation with a designed matrix of five different Latent Visual Reasoning (LVR) variants to evaluate the correlation between cosine alignment and accuracy.</p>
<p>   &#8211; Introduction of PRISM diagnostics: a linear probe to determine answer decodability and a corruption test to assess the dependency on latent states.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The study reveals an inverse correlation between the cosine alignment of supervised latents and model accuracy (r=-0.94).</p>
<p>   &#8211; Answers in VLMs are decoded from downstream latents rather than directly within them, indicating limited dependency on these latent states.</p>
<p>   &#8211; An Information Bottleneck approach demonstrates that auxiliary objectives reshape models through shared parameters, rather than exclusively through the targeted latent variables.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.05753" target="_blank">https://huggingface.co/papers/2606.05753</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609234041316.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>27. Reasoning over Grammar: Can Synthetic Linguistic Reasoning Traces Enhance Low-Resource Machine Translation?</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Large language models, machine translation, low-resource languages, linguistic reasoning traces, in-context learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To explore the potential of large language models to improve machine translation for extremely low-resource languages by using structured linguistic reasoning traces.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Proposing a pipeline for generating linguistic reasoning traces from resources like Universal Dependencies treebanks, dictionaries, and grammar-rule banks.</p>
<p>   &#8211; Evaluating linguistic reasoning traces in the contexts of in-context learning, supervised fine-tuning, and reinforcement fine-tuning, specifically for Xibe and Chintang languages.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Linguistic reasoning traces significantly enhance translation performance during inference as reliable sentence-specific traces improve performance across models and languages.</p>
<p>   &#8211; Using traces as training data results in less consistent improvements, indicating that effective inference-time guidance can better leverage grammatical information for low-resource machine translation, while generating reliable analyses remains challenging.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.03782" target="_blank">https://huggingface.co/papers/2606.03782</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609234234434.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>28. Set-Based Transformer for Atmospheric Compensation in Standoff LWIR Hyperspectral Imaging</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Lightweight deep learning, LWIR hyperspectral imaging, atmospheric compensation, transmittance estimation, sparse autoencoder</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Develop a lightweight deep learning framework for atmospheric compensation in passive long-wave infrared hyperspectral imaging.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilize a set-based deep learning framework to jointly estimate transmittance, atmospheric path radiance, and downwelling spectrum from multi-range radiance measurements. Analyze learned representation using a sparse autoencoder.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The framework demonstrates low spectral distortion in atmospheric compensation tasks, with geographically coherent latent features emerging without location supervision. Publicly available dataset and code.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.08324" target="_blank">https://huggingface.co/papers/2606.08324</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609234202156.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>29. Pruning and Distilling Mixture-of-Experts into Dense Language Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Mixture-of-Experts, Knowledge Distillation, Memory-Constrained Deployment, Dense Architectures, Scoring Method</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop a systematic framework for converting Mixture-of-Experts (MoE) models into fully dense architectures, addressing memory constraints during deployment.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Experts within MoE are scored, selected, and grouped through a variety of methods, then concatenated into a dense feedforward network. Knowledge distillation is applied from the original MoE to refine the dense architecture.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The novel diversity-aware scoring method outperforms previous methods in various testing configurations, demonstrating a significant improvement in downstream accuracy (+6.3 pp) and training speed (1.6x faster) compared to traditional pruning techniques.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.28207" target="_blank">https://huggingface.co/papers/2605.28207</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609234300097.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>30. </h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="" target="_blank"></a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/thirteen/202606091781048606.jpg"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>31. EMMA: Extracting Multiple physical parameters from Multimodal Data</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: EMMA, physics-informed, multimodal, Liquid Time-Constant, dynamical parameters</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To introduce and utilize EMMA, a novel physics-informed multimodal framework, for recovering dynamical parameters directly from raw video, audio, and image data.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilization of a Liquid Time-Constant network and physics-constrained loss for learning latent dynamics and enforcing consistency with differential equations. </p>
<p>   &#8211; A unified feature pipeline enables the alignment of data across various modalities without the need for additional segmentations or sensors.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; EMMA achieves robust multi-parameter recovery and outperforms existing single-modality baselines. It is established as a scalable solution for extracting physics-consistent models from multimodal data.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.24047" target="_blank">https://huggingface.co/papers/2605.24047</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260609234313480.mp4"></video> </figure>
</p>
</div>
<div style="height:30px"></div>
<h3>32. Honest Lying: Understanding Memory Confabulation in Reflexive Agents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Reflexion-style agents, self-generated reflections, memory confabulation, Reflection Repetition Rate, trajectory-level failure signals</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to identify and address the issue of persistent errors in Reflexion-style agents due to incorrect self-generated reflections, specifically measured by a new metric called the Reflection Repetition Rate (RRR).</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The research utilizes the Reflection Repetition Rate (RRR) metric to detect repeated reliance on incorrect reflections across environments such as ALFWorld and HumanEval and employs programmatic extraction of trajectory-level failure signals to mitigate these errors.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The study finds that memory confabulation leads to persistent errors and incorrect task interpretations. Mitigation strategies significantly increase correct object mention and reduce RRR, demonstrating that improvements in reflective memory processes can reduce errors and support more accurate task execution.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.29463" target="_blank">https://huggingface.co/papers/2605.29463</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609234248125.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>33. CIPER: A Unified Framework for Cross-view Image-retrieval and Pose-estimation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: CIPER, Cross-view geo-localization, 3-DoF pose estimation, transformer encoder, multi-task objective</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study presents CIPER, aiming to solve cross-view geo-localization by improving simultaneous city-scale retrieval and precise 3-DoF pose estimation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes a shared transformer encoder with task-specific tokens and a two-way transformer pose decoder for disentangling retrieval features and improving localization accuracy.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; CIPER demonstrates competitive performance on VIGOR, KITTI, and Ford Multi-AV datasets, particularly in scenarios with limited field-of-view and variable orientations.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.05011" target="_blank">https://huggingface.co/papers/2606.05011</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609234214195.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>34. PIPE-Cypher: Automatic Enterprise Benchmark Generation for Text-to-Cypher Systems</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Local Benchmark-Generation Pipeline, Property Graphs, Text2Cypher, Execution Validation, Diversity Controls</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Knowledge Representation and Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop PIPE-Cypher, a local benchmark-generation pipeline that transforms live property graphs and seed queries into balanced NL-to-Cypher datasets for enterprise knowledge graphs.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes processes such as schema profiling, reverse-query grounding, constrained generation, execution validation, and the employment of a calibrated local LLM judge, utilizing local Qwen3.5-9B generation and judging.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; PIPE-Cypher consistently creates repeatable and adaptable Text2Cypher benchmarks. It highlights that zero-shot transfer is limited whereas schema-specific example banks enhance compatible model performances.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.08481" target="_blank">https://huggingface.co/papers/2606.08481</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609234145989.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>35. Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: SCOUT framework, prompt-injection detection, detector allocation, safety-utility threshold, SCOUT-450</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The SCOUT framework aims to dynamically allocate prompt-injection detection by predicting the reliability and latency of detectors to improve safety and efficiency over single-detector approaches.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The framework reframes defense as detector allocation, deciding which detectors to run per request and whether to escalate to an LLM judge, using predictions based on past detector behavior.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; SCOUT reduces the attack-success rate by 46% and total wall-clock time by 40%, with only a 5.1-point drop in benign utility, when compared to an always-on GPT-4o judge. It also shows improved performance on external benchmarks, enhancing the safety-utility frontier.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.30837" target="_blank">https://huggingface.co/papers/2605.30837</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609234120193.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>36. Self-Evaluation Is Already There: Eliciting Latent Judge Calibration in Base LLMs with Minimal Data</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Self-Evaluation Elicitation, Reinforcement Learning, Calibration, Masked Distillation, Transferable Quality Evaluation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective is to improve model calibration for quality assessment through a novel method called Self-Evaluation Elicitation (SEE).</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The SEE method employs calibration-coupled reinforcement learning and masked distillation in a short cycle to enhance prediction accuracy whilst maintaining answer quality.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The SEE method successfully surfaces a model’s latent ability to predict judge scores beyond specific preferences, demonstrating a transferable quality evaluation on various benchmarks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.05122" target="_blank">https://huggingface.co/papers/2606.05122</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609234053059.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>37. SigmaScale: LLM Compression with SVD-based Low-Rank Decomposition and Learned Scaling Matrices</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Large Language Model, compression, scaling matrices, activation-aware compression, effective-rank entropy</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To present SigmaScale, a method for learning auxiliary scaling matrices to improve the compression of Large Language Models using truncated Singular Value Decomposition.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; SigmaScale optimizes vectors for diagonal row and column scaling transformations based on activation-aware compression loss, lowering the effective intrinsic rank of weight matrices.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; SigmaScale demonstrates competitive performance with state-of-the-art SVD-based compression methods across benchmarks, offering a flexible route for low-rank LLM compression, which is beneficial in reducing LLM-inference computing costs.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.07098" target="_blank">https://huggingface.co/papers/2606.07098</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609234027180.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>38. Skill-3D: Evolving Scene-Aware Skills for Agentic 3D Spatial Reasoning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Skill-3D, 3D spatial reasoning, scene-aware skills, tool utilization, self-evolving</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Knowledge Representation and Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper introduces the Skill-3D framework which aims to improve agent performance in 3D spatial reasoning tasks by developing scene-aware skills through a self-evolving system.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Skill-3D utilizes a self-evolving memory and skill library to develop and refine scene-aware skills. This system tracks tool-use trajectories across different scenes, distilling successful ones into reusable skills and using failed attempts as learning lessons.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Experiments demonstrate significant improvements in tool utilization in 3D reasoning tasks, such as a 67% enhancement for Gemini-3-Flash on MMSI-Bench and a 43% improvement for Qwen3-VL-8B on VSI-Bench, highlighting the effectiveness of skill-guided tool use strategies.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.07436" target="_blank">https://huggingface.co/papers/2606.07436</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609234000671.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>39. EmpiriGraph-Psy: A Dataset and LLM Pipeline for Extracting Empirical Relation Graphs from Psychology Abstracts</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Empirical Graph Extraction, Variable-Centered, Psychology, Staged Pipeline, Typed Graphs</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Knowledge Representation and Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to map psychology abstracts to typed graphs using normalized variables and empirical relations, specifically targeting variable-oriented empirical fields like psychology to bridge existing gaps in scientific relation extraction benchmarks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; A staged pipeline approach is employed for graph extraction, which involves separate steps for variable extraction, normalization, hierarchy construction, evidence selection, relation extraction, and edge validation. This method is compared against direct extraction methods.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The staged pipeline approach significantly improves performance, achieving a macro-F1 of 0.74, though challenges in moderating relations and concept hierarchies remain, particularly in extracting higher-order empirical claims from abstracts.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.08362" target="_blank">https://huggingface.co/papers/2606.08362</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233936370.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>40. PBSD: Privileged Bayesian Self-Distillation for Long-Horizon Credit Assignment</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Privileged Bayesian Self-Distillation, Credit Assignment, Reinforcement Learning, Bayesian Evidence Scoring, Autoregressive Decomposition</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce Privileged Bayesian Self-Distillation to enable fine-grained credit assignment in long-horizon tasks by converting sparse rewards into calibrated turn-level signals.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilization of Bayes&#8217; rule to transform the posterior-to-prior probability ratio into a tractable likelihood ratio between a student model and a privileged teacher model.</p>
<p>   &#8211; Implementation of autoregressive decomposition to derive turn-level signals from Bayesian evidence scoring.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; PBSD enhances performance across various settings and facilitates effective policy learning and improved generalization by transforming sparse outcome supervision into Bayes-calibrated credit signals.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.09348" target="_blank">https://huggingface.co/papers/2606.09348</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233908791.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>41. Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: SkeMex, Medical Agents, Skill Memory, Clinical Decision Making, Contextual Utility</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI in Healthcare</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop SkeMex, a self-evolving framework that enhances medical agent systems through structured skill memory to improve long-term clinical reasoning.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilization of a post-deployment self-evolution framework distilling informative interaction trajectories into structured skills, organized in a multi-branch repository and governed by context-dependent utility.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; SkeMex outperforms existing memory-based agents in clinical tasks, generalizes across model backbones, and supports the adaptation of transferable skill memory.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.09365" target="_blank">https://huggingface.co/papers/2606.09365</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233843765.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>42. Trust Functions: Near-Lossless Weak-to-Strong Generalization by Learning When to Trust the Weak Teacher</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Trust Functions, Weak-to-Strong Generalization, Reliable Labels, Data Selection</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To enhance weak-to-strong generalization by leveraging trust functions to identify reliable weak labels for training across various domains.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduce trust functions that assign trust scores to weak labels, using these scores to filter weak supervision and enable iterative training chains.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Trust functions enable students to match or exceed performance with ground-truth supervision, facilitating an effective weak-to-strong generalization process.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.01000" target="_blank">https://huggingface.co/papers/2606.01000</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233815690.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>43. Phase Marginalization for Patch-Grid Instability in Vision Transformers</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Phase Marginalization, Vision Transformers, patch-grid phase, dense prediction, Uniform Phase Marginalization</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective is to address phase-dependent instability in Vision Transformers by proposing the novel method of Phase Marginalization for evaluating structured patch-grid phases and aggregating outputs in the original image coordinate system.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; A post-hoc marginalization approach called Phase Marginalization is formalized to handle patch-grid phases for dense predictions without additional training. This method includes evaluating structured patch-grid phases and inverse-aligning dense outputs.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; This research demonstrates that Uniform Phase Marginalization with K = 4 surpasses the traditional K = 1 baseline in segmentation, depth, and local matching experiments. It provides better performance with a modest compute-matched advantage over generic test-time augmentation in Cityscapes experiments. The study also highlights that using K = 8 or K = 16 offers minimal accuracy gain at higher costs.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.08132" target="_blank">https://huggingface.co/papers/2606.08132</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233730879.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>44. Optical Reasoning: Rethinking Images as an Expressive Reasoning Medium Beyond Text</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Optical reasoning, Chain-of-Thought, Large Language Models, Multimodal Large Language Models, Token efficiency</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; This study proposes the concept of optical reasoning, exploring the use of images as a standalone reasoning medium for language and multimodal tasks to achieve higher token efficiency compared to traditional text-based approaches.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study introduces two variants of optical reasoning: typographic-based, which optimizes visual layouts, and graphical-based, which composes text and graphical elements into visual rationales.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Optical reasoning can match or surpass traditional text reasoning, reducing reasoning tokens by 28.57% on language tasks and 16% on multimodal tasks, thus enhancing token efficiency by 1.96 times. This indicates that images can effectively encode rationales while providing a unified visual platform for reasoning.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.09585" target="_blank">https://huggingface.co/papers/2606.09585</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233703422.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>45. Robotic Policy Adaptation via Weight-Space Meta-Learning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: WIZARD, Vision-Language-Action models, LoRA parameters, meta-learning, task adaptation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective is to develop WIZARD, a framework providing task-specific adaptation for Vision-Language-Action models without requiring fine-tuning, utilizing language instructions and demonstration videos.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; WIZARD operates by predicting task-specific LoRA parameters in a single forward pass, using language instructions and demonstration videos during the meta-training phase to generate expert LoRA updates without target-task action labels.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Experiments on LIBERO demonstrate that WIZARD significantly improves performance, enhancing results by up to ~2x on unseen datasets and up to ~14x on unseen tasks, especially on a Franka Emika Panda, where WIZARD surpasses a real-domain adapted baseline.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.07217" target="_blank">https://huggingface.co/papers/2606.07217</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233640067.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>46. Text-to-Image Models Need Less from Text Encoders Than You Think</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Text-to-image models, text embeddings, diffusion transformer-based models, visual quality, text fidelity</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To explore which aspects of text representation are essential for image generation in text-to-image models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Developed a new text embedding that captures only individual word meanings and order, lacking complete contextual information, to evaluate its impact on image generation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Text-to-image models primarily depend on simple text representation aspects like word merging and order, rather than exploiting richer contextual information. The study finds that such simplified text embeddings can still guide image generation successfully, maintaining high visual quality and text fidelity.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.03715" target="_blank">https://huggingface.co/papers/2606.03715</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233606205.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>47. Answer Presence Drives RAG Rewriting Gains</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: QA performance, gold answer, intervention audit, LLM rewriter, sentinel changes</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study investigates the causal factors behind the performance boost in multi-hop QA systems, specifically determining whether the presence of the gold answer in rewritten contexts is the main driver.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The researchers conducted controlled interventions where they manipulated rewritten contexts by either removing or injecting the gold answer and assessed the impact on QA performance across multiple reader configurations and datasets.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The presence of the gold answer in rewritten contexts significantly enhances QA performance, with its removal causing notable F1 decrease, and injection causing improvement. Conventional probing methods demonstrated fragility to sentinel changes.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.05633" target="_blank">https://huggingface.co/papers/2606.05633</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233540525.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>48. Reasoning Arena: Trace Tournaments When Verifiable Rewards Fall Short</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Reinforcement Learning, Verifiable Rewards, Reasoning Arena, Trace Tournaments, Bradley-Terry Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To enhance reinforcement learning for large language models by introducing a more informative reward system through the Reasoning Arena framework.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implementing trace tournaments to differentiate reasoning quality among non-diverse reward groups.</p>
<p>   &#8211; Utilizing a judge system and dynamically updated trace pools for efficient relative ranking.</p>
<p>   &#8211; Applying Bradley-Terry models on incomplete comparison graphs to facilitate scalable RL integration.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Reasoning Arena outperforms the baseline RLVR by 7.6% in reasoning tasks, accelerates training speed by 27% to 41%, and reduces computation by nearly 50%.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.09380" target="_blank">https://huggingface.co/papers/2606.09380</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233513534.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>49. Why Muon Outperforms Adam: A Curvature Perspective</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Muon, Adam, large language model training, curvature penalty, Normalized Directional Sharpness (NDS)</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to uncover the reasons behind Muon&#8217;s superior performance over Adam in large language model training, focusing on curvature perspectives.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Application of second-order Taylor approximation to the training landscape.</p>
<p>   &#8211; Analysis of curvature penalties through decomposition into components like squared update norm and NDS.</p>
<p>   &#8211; Investigation of training data imbalance using Zipf-Probabilistic Context-Free Grammar (PCFG).</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Muon exhibits a larger one-step loss decrease and incurs a smaller second-order curvature penalty than Adam, attributed to lower NDS.</p>
<p>   &#8211; Data imbalance and heterogeneous curvature conditions amplify Muon&#8217;s advantages.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.04662" target="_blank">https://huggingface.co/papers/2606.04662</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233448593.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>50. OmniCap-IF: Benchmarking and Improving Instruction Following Abilities for Omni-Video Captioning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: OmniCap-IF, omni-modal captioning, instruction-following, format-content tradeoff, Temporal Grounding</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper introduces OmniCap-IF, the first comprehensive benchmark to evaluate instruction-following capabilities in omni-modal captioning, addressing the gap in assessing multi-modal reasoning under complex user instructions.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; A systematic framework is employed to evaluate captions on format correctness and content correctness across 50 distinct constraint types in pure visual, pure audio, and audio-visual modalities, including Temporal Grounding for spatio-temporal precision.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The study reveals significant performance disparities among models and identifies a critical &#8220;format-content tradeoff,&#8221; explaining that increased formatting complexity degrades omni-modal reasoning. OmniCaptioner-IF, a new model, demonstrates notable improvements through a curated 54K instruction-tuning dataset.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.08572" target="_blank">https://huggingface.co/papers/2606.08572</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233422053.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>51. Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Reward Models, Reinforced Fine-Tuning, Reinforcement Learning, Structured Agentic Task, Evidence Aggregation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce a unified reward modeling framework called Skill-RM that treats reward computation as a structured agentic task.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes a consistent interface for orchestrating heterogeneous resources, dynamically selecting and aggregating evidence based on specific input requirements.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Skill-RM outperforms traditional judge baselines in reward benchmarks and downstream applications by providing a unified solution for reward modeling with superior performance.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.03980" target="_blank">https://huggingface.co/papers/2606.03980</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233358406.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>52. Bayesian-Agent: Posterior-Guided Skill Evolution for LLM Agent Harnesses</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Bayesian-Agent, SOPs, hypotheses, task performance, model success</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to introduce Bayesian-Agent, a framework using Bayesian inference for enhancing agent behavior and task performance by treating reusable skills and SOPs as hypotheses for success.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes Bayesian inference to guide agent behavior and optimize task performance through posterior-guided harness optimization. The framework records trajectory evidence and maintains a categorical posterior over each skill.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Bayesian-Agent framework improves the performance of different benchmarks significantly, suggesting that agent skill evolution is more effective when viewed as posterior-guided harness optimization rather than uncalibrated prompt accumulation.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.08348" target="_blank">https://huggingface.co/papers/2606.08348</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233331053.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>53. AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AHA-WAM, dual Diffusion Transformers, asynchronous world-action model, horizon-adaptive offset training, Observation-Guided Video-Context Routing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop an Asynchronous Horizon-Adaptive World-Action Model (AHA-WAM) for efficient long-horizon planning and real-time action execution in robotic manipulation tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilization of dual Diffusion Transformers architecture to decouple temporal resolutions for world prediction and action execution.</p>
<p>   &#8211; Implementation of horizon-adaptive offset training and Observation-Guided Video-Context Routing for asynchronous execution.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; AHA-WAM achieved state-of-the-art performance, with 92.80% success on RoboTwin and 78.3% success in real-world tasks without robot-data pretraining.</p>
<p>   &#8211; The model demonstrated a 24.17 Hz closed-loop control with a 4.59x speedup over Fast-WAM.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.09811" target="_blank">https://huggingface.co/papers/2606.09811</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233306701.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>54. OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Vision-language model, OmniGameArena, reflection-based improvement, Unreal Engine 5, Improvement Dynamics Curve</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To establish a unified benchmark, OmniGameArena, for evaluating Vision-language model (VLM) agents across diverse game settings to track their performance evolution and skill generalization.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduced twelve games built with Unreal Engine 5 encompassing Solo, PvP, and Coop modes.</p>
<p>   &#8211; Developed a unified action interface and Improvement Dynamics Curve (IDC) which uses a tool-using reflector LLM to refine bounded skill prompts over multiple rounds.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Demonstrated the effectiveness of IDC by reporting performance metrics on a cold-start leaderboard and additional observables, showcasing how agent scores evolve and how skills generalize across tasks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.09826" target="_blank">https://huggingface.co/papers/2606.09826</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233241604.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>55. Echo-Memory: A Controlled Study of Memory in Action World Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Echo-Memory, memory mechanisms, action-conditioned world models, replay quality, state-space recurrence</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To investigate the impact of memory structure and capacity on the performance of action-conditioned world models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Conducted a controlled study using Echo-Memory, varying only the memory storage and retrieval mechanisms while keeping other factors like the video diffusion backbone constant.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Raw context provides a robust capacity baseline, significantly enhancing open-domain return performance.</p>
<p>   &#8211; Compact memory designs, while efficient, can lead to the loss of essential evidence for accurate memory recall.</p>
<p>   &#8211; State-space recurrence stands out as the most effective mechanism for open-domain returns, demonstrating the critical role of implicit memory structure.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.09803" target="_blank">https://huggingface.co/papers/2606.09803</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233211599.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>56. SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: SpatialWorld, multimodal agents, spatial reasoning, partial observability, text-based actions</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introducing SpatialWorld as a unified benchmark to evaluate interactive spatial understanding in multimodal agents through diverse real-world tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Integration of eight heterogeneous simulation backends under a unified protocol, enabling tasks with vision-only partial observability and decision-making via a text-based action interface.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Highlighting challenges in robust spatial task solving, with the most advanced model achieving a low task success rate, revealing inefficiencies and performance variations across domain-specific tasks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.09669" target="_blank">https://huggingface.co/papers/2606.09669</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233145813.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>57. CoVEBench: Can Video Editing Models Handle Complex Instructions?</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: CoVEBench, compositional video editing, multi-point editing instructions, video fidelity, video quality</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper introduces CoVEBench, a benchmark designed to assess the capabilities of current models in compositional video editing, specifically focusing on handling complex and multi-step editing tasks while preserving spatiotemporal content.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; CoVEBench consists of 416 curated source videos, 626 multi-point editing instructions, and 9,990 fine-grained checklist items to evaluate models on instruction compliance and video fidelity using automated metrics for assessing video quality.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The study finds that compositional video editing remains challenging; models often fail to implement all edits correctly, breach preservation constraints, or generate artifacts when executing multiple operations at the same time.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.08415" target="_blank">https://huggingface.co/papers/2606.08415</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260609233119393.mp4"></video> </figure>
</p>
</div>
<div style="height:30px"></div>
<h3>58. LatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM Agents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: LatentSkill, LoRA adapters, weight space, semantic geometry, parameter-space arithmetic</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective of the research is to develop LatentSkill, a framework designed to efficiently convert textual skills into LoRA adapters for agent systems, reducing context overhead while maintaining modularity and composability.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The researchers utilized LatentSkill to transform textual skills into plug-and-play LoRA adapters using a pretrained hypernetwork, allowing these skills to be stored in weight space instead of context space.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; LatentSkill was shown to outperform in-context skill baselines in specific benchmarks, achieving significant improvements in task success and efficiency in ALFWorld and Search-QA. It demonstrated that weight-space skills are efficient, modular, and offer less exposure compared to context-space skills.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06087" target="_blank">https://huggingface.co/papers/2606.06087</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233044680.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>59. On the Geometry of On-Policy Distillation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: On-policy distillation, Parameter space, Subspace locking, Reinforcement learning, Supervised fine-tuning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research investigates the unique geometric patterns in parameter space dynamics of On-policy distillation (OPD) and compares them with supervised fine-tuning and reinforcement learning with verifiable rewards.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study uses parameter-space diagnostics to compare the trajectory of OPD updates in parameter space with other methods, highlighting subspace locking and relaxed off-principal updates.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; OPD forms a distinct update geometry, characterized by fewer weight updates and subspace locking, which is functionally sufficient for OPD but not for supervised fine-tuning. The study highlights how OPD&#8217;s dynamics are unique and not merely intermediate between other methods.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.07082" target="_blank">https://huggingface.co/papers/2606.07082</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260609233019306.png"></figure>
</p>
</div>
<div style="height:30px"></div>
]]></content:encoded>
					
		
		<enclosure url="https://cdn.ainative.foundation/huggingface/20260609233056967.mp4" length="0" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/huggingface/20260609233753796.mp4" length="0" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/huggingface/20260609233920723.mp4" length="0" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/huggingface/20260609234313480.mp4" length="0" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/huggingface/20260609233119393.mp4" length="0" type="video/mp4" />

			</item>
		<item>
		<title>Global AI Native Industry Insights &#8211; 20260609 &#8211;  Cognition &#124; Google &#124; Nvidia &#124; more</title>
		<link>https://ainativefoundation.org/global-ai-native-industry-insights-20260609-cognition-google-nvidia-more/</link>
		
		<dc:creator><![CDATA[AINF]]></dc:creator>
		<pubDate>Tue, 09 Jun 2026 09:34:11 +0000</pubDate>
				<category><![CDATA[Global Industry]]></category>
		<guid isPermaLink="false">https://ainativefoundation.org/global-ai-native-industry-insights-20260609-cognition-google-nvidia-more/</guid>

					<description><![CDATA[Explore FrontierCode benchmark, Google NotebookLM upgrades, and Apple Cloud expansion.]]></description>
										<content:encoded><![CDATA[<p>Explore FrontierCode benchmark, Google NotebookLM upgrades, and Apple Cloud expansion. Discover more in Today’s Global AI Native Industry Insights.</p>
<h3>1.  Cognition launches FrontierCode benchmark to evaluate AI code mergeability beyond correctness</h3>
<p>Cognition introduces FrontierCode, a new AI coding benchmark developed with 36 open-source maintainers who each invested over 40 hours per task. Unlike traditional benchmarks that focus on functional correctness, FrontierCode measures whether AI-generated code would actually be accepted by human maintainers for production codebases. The benchmark evaluates code along six axes including behavioral correctness, regression safety, mechanical cleanliness, test correctness, scope discipline, and code quality. Initial results show that over half of outputs passing earlier SWE-Bench tests fall short on mergeability standards, with even advanced models like Anthropic&#8217;s Claude Opus 4.8 scoring only 13.4% on the most difficult Diamond tasks.</p>
<p>Read more: <a href="https://cognition.ai/blog/frontier-code">https://cognition.ai/blog/frontier-code</a></p>
<p><video width="600" height="400" controls poster="https://cdn.ainative.foundation/image/20260609_52ae58b4a25b4fa5bc80734303cf8c19.jpg"><source src="https://cdn.ainative.foundation/video/20260609_en_frontierCode.mp4" type="video/mp4"></video></p>
<p>Video Credit: NotebookLM</p>
<h3>2.  Google upgrades NotebookLM with agentic capabilities, Gemini 3.5, and new output formats</h3>
<p>Google announced major upgrades to NotebookLM that add agentic AI capabilities, advanced reasoning with Gemini 3.5, and ability to generate multiple file formats. The tool can now autonomously search for sources, write and run code through a secure cloud computer, and create downloadable outputs like PDFs, spreadsheets and presentations. The upgrade transforms NotebookLM from a document summarizer into an autonomous research assistant that can handle complex multi-step research projects. The new capabilities are rolling out to Google AI Ultra subscribers and Workspace customers with expanded access.</p>
<p>Read more: <a href="https://blog.google/innovation-and-ai/products/notebooklm/better-research-notebooklm/">https://blog.google/innovation-and-ai/products/notebooklm/better-research-notebooklm/</a></p>
<p><video width="600" height="400" controls poster="https://cdn.ainative.foundation/image/20260609_3b7aa4daa0924ca2abe4bd2e9f01c4ae.jpg"><source src="https://cdn.ainative.foundation/video/20260609_3458dfdbaa7f4731bbd3f63de28524f1.mp4" type="video/mp4"></video></p>
<p>Video Credit: @NotebookLM on X</p>
<h3>3.  Apple expands Private Cloud Compute to Google Cloud using NVIDIA GPUs for Apple Intelligence workloads</h3>
<p>Apple announced it is expanding Private Cloud Compute beyond its own data centers to Google Cloud, using NVIDIA GPUs to run new Apple Intelligence workloads. The collaboration extends Apple&#8217;s privacy protections to third-party data centers for the first time, utilizing NVIDIA Confidential Computing with NVIDIA GPUs, Intel CPUs with TDX, and Google&#8217;s Titan chip. Apple worked with Google to leverage Gemini model technologies for building next-generation Apple Foundation Models, with the most demanding tasks including agentic tool-use and complex reasoning running on NVIDIA GPUs in Google Cloud while maintaining Apple&#8217;s security and privacy protections.</p>
<p>Read more: <a href="https://security.apple.com/blog/expanding-pcc/">https://security.apple.com/blog/expanding-pcc/</a></p>
<p><video width="600" height="400" controls poster="https://cdn.ainative.foundation/video/20260609_en_nvdia_img.png"><source src="https://cdn.ainative.foundation/video/20260609_en_nvidia.mp4" type="video/mp4"></video></p>
<p>Video Credit: NotebookLM</p>
<div style="width:100%;height:2px;background:#808080;margin:10px 0"></div>
<p>That’s all for today’s Global AI Native Industry Insights. Join us at <a href="https://member.ainativefoundation.org/">AI Native Foundation Membership Dashboard</a> for the latest insights on AI Native, or follow our linkedin account at <a href="https://www.linkedin.com/company/ainativefoundation/">AI Native Foundation</a> and our twitter account at <a href="https://x.com/AINativeF">AINativeF</a>.</p>
]]></content:encoded>
					
		
		<enclosure url="https://cdn.ainative.foundation/video/20260609_en_frontierCode.mp4" length="12753884" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/video/20260609_3458dfdbaa7f4731bbd3f63de28524f1.mp4" length="7381302" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/video/20260609_en_nvidia.mp4" length="7985187" type="video/mp4" />

			</item>
		<item>
		<title>AI Native Daily Paper Digest &#8211; 20260608</title>
		<link>https://ainativefoundation.org/ai-native-daily-paper-digest-20260608/</link>
		
		<dc:creator><![CDATA[insights]]></dc:creator>
		<pubDate>Tue, 09 Jun 2026 00:41:22 +0000</pubDate>
				<category><![CDATA[Papers]]></category>
		<guid isPermaLink="false">https://ainativefoundation.org/ai-native-daily-paper-digest-20260608/</guid>

					<description><![CDATA[1. Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings 🔑 Keywords: Large language models, EmbedFilter, text embeddings, high-frequency tokens, [&#8230;]]]></description>
										<content:encoded><![CDATA[<h3>1. Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Large language models, EmbedFilter, text embeddings, high-frequency tokens, dimensionality reduction</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; This study aims to address the deficiency in large language models&#8217; embedding capabilities by introducing EmbedFilter, a linear transformation that enhances semantic representations and enables dimensionality reduction.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The authors identified that text embeddings tend to align with frequent, uninformative tokens, and thus, applied EmbedFilter to suppress the influence of these high-frequency tokens, refining the semantic quality of the embeddings.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Experiments show that models integrated with EmbedFilter achieve better zero-shot performance on downstream tasks, even with reduced embedding dimensions, suggesting enhanced efficiency and quality of semantic representations.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.07502" target="_blank">https://huggingface.co/papers/2606.07502</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233013089.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>2. AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: egocentric simulation, 3D human motion, spatial grounding, self-evolving worlds, anchor views</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To enhance egocentric simulation through improved interaction integrity and world customization using 3D human motion and anchor view definitions.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilization of 3D human motion as the primary interaction modality, and incorporation of auxiliary training supervision with exogenous viewpoints to improve spatial grounding of human-world interactions.</p>
<p>   &#8211; Introduction of a mechanism for customizing self-evolving worlds by defining anchor views within a unified world coordinate system and using textual descriptions for dynamic evolution of scenes.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; AnchorWorld outperforms state-of-the-art baselines, with ablation studies supporting the effectiveness of its designs. The proposed customization scheme shows promising spatio-temporal geometric consistency and adheres strictly to the prescribed evolutionary dynamics.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.07326" target="_blank">https://huggingface.co/papers/2606.07326</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233128477.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>3. GENEB: Why Genomic Models Are Hard to Compare</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: GENEB, genomic foundation models, diagnostic benchmark, probing-based protocol, model rankings</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Foundations of AI</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To introduce GENEB, a comprehensive benchmark for evaluating genomic foundation models across diverse tasks and architectures under a unified protocol.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implementation of a large-scale diagnostic benchmark called GENEB, which evaluates frozen representations from 40 genomic foundation models across 100 tasks spanning 13 functional categories using a unified probing-based protocol.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Current evaluation practices show limitations, as model rankings vary sharply across task categories, with scale providing modest and inconsistent gains. GENEB is positioned as a reference framework for principled comparison and category-aware model selection in genomic machine learning.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.04525" target="_blank">https://huggingface.co/papers/2606.04525</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233044347.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>4. Robots Need More than VLA and World Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Generalist robot intelligence, unstructured behavioral data, embodiment mapping, world modeling, reward inference</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper argues for a shift from focusing solely on policy scaling to incorporating unstructured behavioral data through specialized interfaces to enhance robot intelligence.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study highlights the need for interfaces for autolabelling unstructured behavior, retargeting human motion, 3D reasoning for world modeling, and inferring rewards.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The authors propose a research agenda for building robotic systems capable of learning from the broader physical world, not just robot demonstrations.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06556" target="_blank">https://huggingface.co/papers/2606.06556</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233222167.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>5. OpenSkill: Open-World Self-Evolution for LLM Agents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: OpenSkill, self-evolving agents, open-world deployment, verification signals, transferable skills</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; This study introduces OpenSkill, aiming to enable agents to develop skills and verification signals independently using open-world resources, without relying on target-task supervision.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; OpenSkill employs a framework to bootstrap the learning loop by acquiring grounded knowledge and verification anchors from various sources and synthesizing them into transferable skills.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; OpenSkill showcases high automated performance across benchmarks without breaching the no-supervision constraint, effectively transferring skills across models and aligning self-built verifiers with ground-truth outcomes.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06741" target="_blank">https://huggingface.co/papers/2606.06741</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233254830.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>6. UniSHARP: Universal Sharp Monocular View Synthesis</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AI Native, universal monocular rendering, omnidirectional latent space, Gaussian primitives, photorealistic view synthesis</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to extend SHARP for universal monocular rendering across various camera systems by aligning images in an omnidirectional latent space.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; UniSHARP is proposed, which performs implicit alignment in both feature and Gaussian spaces using Gaussian primitives arranged in a ray-based universal representation. A benchmark stratified by field of view is constructed for evaluation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; UniSHARP demonstrates superior performance in universal monocular rendering across diverse imaging systems, outperforming alternative methods by a large margin.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.07514" target="_blank">https://huggingface.co/papers/2606.07514</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233324682.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>7. LIMMT: Less is More for Motion Tracking</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Motion Tracking, High-Quality Data, Data-Centric Study, Physics-Based Humanoid Motion Tracking, Data Cleaning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective is to improve tracking policy optimization for physics-based humanoid motion tracking using high-quality motion data, specifically by utilizing minimal data subsets to outperform full datasets.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduction of LIMMT (Less Is More for Motion Tracking) as a framework, focusing on data quality defined by physics feasibility, diversity, and complexity. Data cleaning on web-sourced mocap data was also conducted.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Demonstrated that less than 3% of the AMASS dataset yields better tracking performance than the full dataset, and extensive experiments validate the effectiveness of the LIMMT framework.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06953" target="_blank">https://huggingface.co/papers/2606.06953</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233350591.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>8. dots.tts Technical Report</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Continuous Autoregressive, AudioVAE, Flow-Matching Head, Low-Latency Speech Generation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The goal is to develop a state-of-the-art continuous autoregressive text-to-speech model, dots.tts, capable of efficient low-latency speech generation across multiple languages.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes a novel training approach with AudioVAE for a semantically structured continuous speech space.</p>
<p>   &#8211; Incorporates full-history conditioning and reward-free self-corrective post-training to enhance robustness and acoustic quality.</p>
<p>   &#8211; Applies CFG-aware MeanFlow distillation to minimize latency in speech generation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The model, trained on a large multilingual corpus, shows superior performance on Seed-TTS-Eval benchmark with impressive WERs and SIM scores.</p>
<p>   &#8211; Achieves open-source state-of-the-art results on multiple benchmarks, showcasing strong stability, voice cloning, and emotional expressiveness.</p>
<p>   &#8211; Efficient inference is possible with dual-streaming modes, facilitating practical deployment and reproducible research.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.07080" target="_blank">https://huggingface.co/papers/2606.07080</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233416748.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>9. PaperFlow: Profiling, Recommending, and Adapting Across Daily Paper Streams</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: PaperFlow, scientific paper recommendation, profiling, interest drift, multi-signal aggregation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop a framework called PaperFlow for recommending scientific papers by processing user profiles, daily paper streams, and addressing interest drift through a three-stage process.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implemented a longitudinal benchmark with 24 users, 50 daily streams, and 1,200 episodes to evaluate PaperFlow.</p>
<p>   &#8211; Organized the framework into three stages: Profiling, Recommending, and Adapting.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; PaperFlow demonstrates superior oracle-based ranking, high behavioral alignment with simulated reading selections, and outperforms scientific recommendation baselines in blind human-evaluation scores.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.07454" target="_blank">https://huggingface.co/papers/2606.07454</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233511781.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>10. Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Astra, Vision-Language Models, action-conditioned visual imagination, world simulator, spatial reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To enhance Vision-Language Models with action-conditioned visual imagination through a spatial reasoning framework called Astra.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Astra employs a reinforcement learning-trained policy coupled with a Bagel-based world simulator to generate novel-view observations, utilizing view consistency tuning and a world-simulator-in-the-loop two-phase RL curriculum.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The Astra framework significantly improves spatial reasoning by providing useful imagined observations, demonstrating improvements on benchmarks such as MMSI-Bench and MindCube; effective reasoning requires learning the optimal use of imagined evidence.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06476" target="_blank">https://huggingface.co/papers/2606.06476</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233442405.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>11. Stream3D-VLM: Online 3D Spatial Understanding with Incremental Geometry Priors</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: 3D vision-language model, autoregressive control modeling, Visual-Spatial Feature Integration, Geometry-Adaptive Voxel Compression</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper presents an online 3D vision-language model aimed at achieving real-time spatial understanding from streaming video.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes autoregressive streaming control modeling to determine response timing.</p>
<p>   &#8211; Employs a Visual-Spatial Feature Integration (VSFI) module to incrementally inject geometry priors.</p>
<p>   &#8211; Proposes a Geometry-Adaptive Voxel Compression (GAVC) module for efficient visual token compression.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Extensive experiments demonstrate the model&#8217;s superior performance over existing proprietary and open-source models in tasks related to 3D spatial understanding, reasoning, and grounding.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06891" target="_blank">https://huggingface.co/papers/2606.06891</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260608233602529.mp4"></video> </figure>
</p>
</div>
<div style="height:30px"></div>
<h3>12. SIA: Self Improving AI with Harness &amp; Weight Updates</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Self-Improving AI, Language-Model Agent, Task-Specific Agent, GPU Optimization, Biological Data Denoising</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop a self-improving AI framework that can update both the model weights and task-specific agent architecture using a language-model feedback agent across diverse tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduced SIA, a self-improving loop that simultaneously updates the harness and weights of a task-specific agent across different domains such as legal classification, GPU optimization, and biological data denoising.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed method, combining both harness and weight updates, outperformed traditional scaffold-only iterations across various tasks, achieving significant improvements in benchmarks like LawBench, GPU kernel runtime, and RNA denoising.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.27276" target="_blank">https://huggingface.co/papers/2605.27276</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233537884.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>13. Compress-Distill: Reasoning Trace Compression for Efficient Knowledge Distillation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: post-hoc compression, knowledge distillation, accuracy-efficiency trade-off, reasoning traces</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Knowledge Representation and Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To explore the benefits of post-hoc compression of reasoning traces for more efficient and cost-effective knowledge distillation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Two instruction-tuned models were used to compress reasoning traces from large teacher models, reducing them to 8.6-21.0% of their original length. Experiments conducted included main grid runs and truncation ablations to compare efficiency and accuracy.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Compressed traces significantly reduce training time and token usage while maintaining a high level of accuracy. Raw traces retain the highest accuracy, but compressed models provide substantial efficiency improvements, up to 18x per token efficiency, especially beneficial for smaller models.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.05988" target="_blank">https://huggingface.co/papers/2606.05988</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233849955.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>14. ECI_{sem}: Semantic Residual Effective Contrastive Information for Evaluating Hard Negatives</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: ECI_sem, dense retrieval, semantic residual, BEIR, MS MARCO</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce ECI_sem, a semantic residual variant of Effective Contrastive Information, to rank negative sources for dense retrieval without training, using frozen embeddings.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; ECI_sem constructs a weighted residual information matrix based on target consistency, semantic locality, lexical residuality, and log-determinant diversity.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; ECI_sem achieves strong performance on MS MARCO and BEIR benchmarks, with high alignment depending on the target encoder and stability under various perturbations.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2603.20990" target="_blank">https://huggingface.co/papers/2603.20990</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233822818.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>15. Towards Retrieving Interaction Spaces for Agentic Search</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: RISE framework, BM25, agentic search, corpus exploration, interaction space</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to develop the RISE framework for efficient corpus exploration by constructing bounded interaction spaces that maintain high accuracy at scale.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study combines BM25 retrieval with preprocessed document indexing to create an interaction space for agentic search, optimizing for shell-style navigation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The RISE framework, when evaluated on BrowseComp-Plus, demonstrated comparable accuracy to the pure-shell DCI baseline at 78% accuracy with lesser costs and outperformed it significantly in larger corpus settings, achieving 81% accuracy on a 1M document set.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06880" target="_blank">https://huggingface.co/papers/2606.06880</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233752496.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>16. A Cookbook of 3D Vision: Data, Learning Paradigms, and Application</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: 3D vision, geometric representations, learning frameworks, datasets, multimodal geometric grounding</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; This study aims to create a data-centric taxonomy for 3D vision, integrating key elements such as geometric representations, datasets, learning frameworks, and applications into a unified conceptual map.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The methods include analyzing various structural representations of 3D data like point clouds, meshes, voxels, and 3D Gaussians, and exploring dataset design, benchmark construction, and supervision regimes.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The research provides a clarified view of the intricate interactions between representations, learning paradigms, and tasks, highlighting the trends toward balancing efficiency and fidelity and emphasizing multimodal geometric grounding.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.04291" target="_blank">https://huggingface.co/papers/2606.04291</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233728936.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>17. When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Multi-objective LLM judge customization, textual gradients, Gradient specificity, instruction interference, Spearman&#8217;s rho</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to customize a large language model (LLM) judge for specific tasks or domains by optimizing its prompts across multiple evaluation criteria using textual gradients.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The authors tested five decomposition modes of textual gradient optimizers by altering the shared cross-task information between loss, gradient, and optimizer LLMs.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The research identified two key failure modes: gradient dilution during optimization and instruction interference during inference, which limit the effectiveness of multi-objective judge customization with textual feedback.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.26046" target="_blank">https://huggingface.co/papers/2605.26046</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233701916.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>18. HarnessForge: Joint Harness and Policy Evolution for Adaptive Agent Systems</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: LLM agents, meta-adaptation, HarnessForge, co-evolution, harness-conditioned policy alignment</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To address the challenges faced by LLM agents in heterogeneous task regimes by proposing a meta-adaptive framework, HarnessForge, which facilitates the co-evolution of agent systems.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduction of a stable adaptation space through harness&#8211;policy pairs separating execution structure from reasoning behavior, and employing fault-guided harness tailoring and harness-conditioned policy alignment for co-evolution.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; HarnessForge enhances the performance of LLM agents like Qwen3-4B and Qwen3-8B, achieving up to 12.0% improvement over baselines, and emphasizes the importance of harmonizing harness and policy to optimize agent-system adaptability.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.01779" target="_blank">https://huggingface.co/papers/2606.01779</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233635780.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>19. Empirical Study on the Characteristics and Evolution of AI-usage in GitHub Repositories: Evidence from Code Comments</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AI tools, LLMs, code implementation, bug fixing, human oversight</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to analyze how AI tools, particularly LLMs, are utilized by developers in real-world software development workflows and the evolution of AI-assisted code over time.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Analysis of 35,361 GitHub code comments referencing AI use, deriving a taxonomy of AI-assisted development activities, and annotating the dataset using LLM-based classifiers. Additionally, the study examines 12,996 subsequent commit messages to understand the evolution of AI-assisted code.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The findings indicate that developers primarily use LLMs for tasks like code implementation, enhancement, debugging, and documentation, with sustained human oversight through refactoring and bug fixing. AI tools are increasingly seen as collaborative support mechanisms, shifting from direct code generation to enhancing conceptual support over time.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06843" target="_blank">https://huggingface.co/papers/2606.06843</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608234009985.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>20. Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Interactive ASR, semantic correction, multi-turn refinement, Sentence-level Semantic Error Rate, reasoning-based editing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Human-AI Interaction</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The primary objective is to reduce semantic errors in Automatic Speech Recognition (ASR) through the integration of semantic correction and reasoning-based editing in a multi-turn refinement process.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study introduces the Agentic ASR framework which combines a single-pass ASR front-end with semantic correction, intent routing, and reasoning-based editing, validated through a new Sentence-level Semantic Error Rate (S^2ER) metric and an Interactive Simulation System.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Iterative interaction in multilingual, named-entity-intensive, and code-switching benchmarks significantly reduces semantic errors more effectively in S^2ER than conventional token-level metrics, demonstrating enhanced alignment and robustness of the proposed framework.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.29430" target="_blank">https://huggingface.co/papers/2605.29430</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233943791.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>21. Measuring Model Robustness via Fisher Information: Spectral Bounds, Theoretical Guarantees, and Practical Algorithms</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Fisher Information Matrix, spectral norm, robustness metric, deep neural networks, adversarial vulnerability</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Foundations of AI</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to introduce a novel attack-agnostic robustness metric for deep neural networks using the spectral norm of the Fisher Information Matrix.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study develops scalable evaluation methods such as power iteration and Hutchinson-based estimation for robustness assessment across different architectures, including VGG, ResNet, DenseNet, and Transformer in both white-box and black-box settings.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The research demonstrates a strong correlation between the proposed metric and adversarial vulnerability, suggesting that the framework serves as an interpretable diagnostic tool for complementing attack-based evaluations and guiding robust model design.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.04767" target="_blank">https://huggingface.co/papers/2606.04767</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233917653.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>22. WorldBench: A Challenging and Visually Diverse Multimodal Reasoning Benchmark</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: WorldBench, Multimodal Large Language Models, Visual Diversity, Reasoning Benchmark</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective is to introduce WorldBench, a visually diverse reasoning benchmark for evaluating Multimodal Large Language Models (MLLMs) and to reveal limitations in current models&#8217; visual understanding capabilities.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study involves constructing a taxonomy of thousands of visual concepts across multiple domains, and curating a broad collection of images from search engines and datasets to represent the visual world comprehensively. It uses structured trial-and-error to design challenging questions for MLLMs.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; WorldBench demonstrates higher visual diversity compared to existing benchmarks, revealing weaknesses in visual understanding where even the strongest MLLMs only reach a 64.0% accuracy, emphasizing the importance of visual diversity in building multimodal benchmarks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06538" target="_blank">https://huggingface.co/papers/2606.06538</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608234031710.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>23. The Distillation Game: Adaptive Attacks &amp; Efficient Defenses</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Distillation attacks, Minimax game, Adaptive evaluation, Defense strategy, Product-of-Experts</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To study the trade-off between model utility and vulnerability to imitation attacks through a minimax game framework.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Developed a minimax game between a utility-constrained teacher and an adaptive student to explore defensive strategies, including adaptive evaluation and a forward-pass-only defense called Product-of-Experts.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The adaptive student recovers more capabilities than passive evaluation reveals, narrowing the robustness gap between costly defenses and the cheaper Product-of-Experts.</p>
<p>   &#8211; The study suggests that strong distillation prevention requires evaluation against adaptive students for progress in antidistillation efforts.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.22737" target="_blank">https://huggingface.co/papers/2605.22737</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608234055884.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>24. </h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="" target="_blank"></a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/thirteen/202606081780962079.jpg"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>25. Augmenting Attention with Exponentially Decaying Memory Improves Query-Aware KV Sparsity</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: RAT+, Memory Module, Sparse Inference, Long-Context Language Models, Query-Aware  </p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing  </p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study investigates whether the RAT+ memory module can enhance accuracy in query-aware sparse inference methods for long-context language models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The paper utilizes RAT+ for various representative methods including Quest, MoBA, and SnapKV, validating the improvements in accuracy across different sparse budgets and tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; RAT+ consistently improves accuracy over standard attention in eight needle-in-a-haystack tasks and is verified on both RAT+ released checkpoints and continued pretraining on OLMo2-7B with a new memory module.</p>
<p>   &#8211; Two hypotheses are proposed and supported by targeted experiments to explain the benefits of this memory module for query-aware sparse inference.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.28640" target="_blank">https://huggingface.co/papers/2605.28640</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608234109812.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>26. Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Agentic search, Retrieval models, Critic model, Feedback loop, Query refinement</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Knowledge Representation and Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The main aim is to enhance agentic search by improving the interaction between reasoning agents and retrieval models through a feedback loop mechanism.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduces Critic-R, a framework utilizing a critic model to evaluate reasoning and retrieval outcomes via dual optimization mechanisms: Critic-R-Zero for inference-time query refinement and Critic-Embed for optimizing retrieval models using automatic supervision.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Critic-R significantly improves retrieval quality and answer accuracy, as demonstrated by evaluations on HotpotQA, 2WikiMultihopQA, MuSiQue, and Bamboogle.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.00590" target="_blank">https://huggingface.co/papers/2606.00590</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608234043177.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>27. How Far Can Chord-Symbol Time-Series Adaptation Carry Genre Identity? Capabilities and Boundaries in Multi-Genre Chord-Symbol Modeling</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Music Transformer, harmonic prediction, LoRA, IA3, genre adaptation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To evaluate the effectiveness of small adaptation interfaces in extending a frozen Music Transformer model to handle multiple genres in harmonic prediction.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study compared five methods, including LoRA and IA3, across a complete 165-cell grid for 11 genres and three seeds, analyzing improvements in chord prediction.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; All methods improved the model over the base, with LoRA and IA3 scoring highest. Chord-symbol adaptation is shown to reliably enhance genre-local harmonic prediction, but it does not fully represent genre identity.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.07334" target="_blank">https://huggingface.co/papers/2606.07334</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608234021513.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>28. CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Contrastive Reflection, reasoning tasks, verifiable rewards, natural-language insights</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Knowledge Representation and Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to enhance AI Native reasoning capabilities in language models by using Contrastive Reflection to generate concise and interpretable insights for model self-improvement.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduced Contrastive Reflection (CORE), a non-parametric algorithm analyzing differences in reasoning traces to derive insights, allowing efficient and faster reasoning task improvements.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; CORE demonstrated rapid and cost-effective performance improvements across various reasoning tasks compared to traditional parametric and non-parametric methods. </p>
<p>   &#8211; It achieves comparable or superior outcomes with limited training samples and is more context-efficient, offering a more interpretable path to model self-improvement than existing approaches.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.28742" target="_blank">https://huggingface.co/papers/2605.28742</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233955289.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>29. Parametric Social Identity Injection and Diversification in Public Opinion Simulation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Large language models, public opinion simulation, Diversity Collapse, Parametric Social Identity Injection, representation-level control</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Address the issue of reduced social diversity in public opinion simulations with large language models by introducing a parametric framework to enhance demographic representation fidelity and diversity.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implemented a Parametric Social Identity Injection framework to inject explicit demographic and value-oriented representations into LLMs.</p>
<p>   &#8211; Conducted extensive experiments using the World Values Survey and multiple open-source LLMs.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed method significantly improves the distributional fidelity and diversity of simulated public opinion data, reducing KL divergence and enhancing overall diversity, offering new insights into scalable, diversity-aware simulations.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2603.16142" target="_blank">https://huggingface.co/papers/2603.16142</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233930107.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>30. Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Imaginative Perception Tokens, spatial reasoning, Vision language models, Perspective Taking, Path Tracing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to enhance vision-language models&#8217; spatial reasoning capabilities by utilizing Imaginative Perception Tokens (IPT), which provide intermediate perceptual representations for improved interpretation of unseen viewpoints.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Three tasks, Perspective Taking, Path Tracing, and Multiview Counting, were formulated and tested using datasets with approximately 20K examples and the unified vision-language model BAGEL as the backbone.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; IPT supervision enhances spatial reasoning and often outperforms traditional text-based methods, improving accuracy by 3.4% on Multiview Counting and showing competitive performance on Path Tracing. Combining IPT with label-only supervision further improves results, whereas textual chain of thought training could hinder performance.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.03988" target="_blank">https://huggingface.co/papers/2606.03988</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260608233902846.mp4"></video> </figure>
</p>
</div>
<div style="height:30px"></div>
<h3>31. Data-Efficient Autoregressive-to-Diffusion Language Models via On-Policy Distillation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Autoregressive Models, Diffusion Language Models, On-Policy Distillation, Train-Inference Mismatch, Bidirectional Attention</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To transform autoregressive language models (ARLMs) into diffusion language models (DLMs) using on-policy distillation to address train-inference mismatch and reduce training token requirements.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implementing an On-Policy Diffusion Language Model (OPDLM) where on-policy distillation (OPD) is used for transforming ARLMs to DLMs, incorporating bidirectional attention for generating trajectories and using original ARLMs for knowledge distillation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; OPDLM significantly reduces the need for training tokens (15x to 7,000x fewer) while maintaining strong performance across various tasks, thus eliminating the high cost of DLM pretraining and improving knowledge retention from ARLMs.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06712" target="_blank">https://huggingface.co/papers/2606.06712</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233837192.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>32. Streaming Video Generation with Streaming Force Control</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: StreamForce, causal model, video generation, distillation pipeline, autoregressive efficiency</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce StreamForce, a streaming video generation framework that provides real-time, physically grounded responses to time-varying forces.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Design a unified force representation as a control signal and develop a distillation pipeline for force-controllable video generation.</p>
<p>   &#8211; Combine autoregressive efficiency with force responsiveness to achieve stable photometric and dynamic realism.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; StreamForce achieves state-of-the-art performance in force adherence and motion realism, running at up to 16.6 FPS on a single GPU.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.07508" target="_blank">https://huggingface.co/papers/2606.07508</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260608233804215.mp4"></video> </figure>
</p>
</div>
<div style="height:30px"></div>
<h3>33. LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: LayerRoute, transformer blocks, LoRA adapters, inference, compute savings</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop a lightweight adapter, LayerRoute, that selectively skips transformer blocks during inference to save computational resources while maintaining or improving model quality.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilized LayerRoute with a per-layer router and LoRA adapters for gated routing on Qwen2.5-0.5B-Instruct, alongside a single end-to-end training pass on agentic data with gate regularization.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Achieved a 12.91% skip differential in FLOPs: tool calls skip 15.25% of FLOPs, while planning steps skip 2.34%. Quality improved over the base model due to LoRA adaptation, with reduced perplexity for both tool calls and planning steps.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.01838" target="_blank">https://huggingface.co/papers/2606.01838</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233740930.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>34. Entropy as a Structural Prior: How a Log-Barrier on DiT Belief Space Drives Musical Diversity and Development</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Confidence-based loss weighting, generative models, entropy, diffusion training, Stable Audio 3</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to improve audio generation through adaptive gradient scaling using confidence-based loss weighting in supervised diffusion training.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduces the Eisbach log-barrier, a parameter-free weight derived from the entropy of the DiT output&#8217;s spatial energy distribution, influencing gradient dampening or preservation.</p>
<p>   &#8211; Applies this method to LoRA fine-tuning of Stable Audio 3 Medium on MusicCaps.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Achieves stronger thematic development, clearer acoustic differentiation, and higher textural diversity in audio generation.</p>
<p>   &#8211; Demonstrates the emergence of a self-referential data curriculum purely from the forward pass with testable predictions.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.07207" target="_blank">https://huggingface.co/papers/2606.07207</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233715644.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>35. Reinforcement Learning from Rich Feedback with Distributional DAgger</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Forward Cross-Entropy, Distributional Imitation Learning, Monotonic Policy Improvement, Reasoning Tasks</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Enable monotonic policy improvement and enhance performance in reasoning tasks through forward cross-entropy objective with distributional imitation learning compared to traditional reinforcement learning methods.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilization of a distributional variant of the classic imitation learning algorithm DAgger, where learners have local access to expert distribution on visited states for the current policy.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Forward cross-entropy provides monotonic policy improvement and guarantees on regret, illustrating improvements over traditional RL and RL with self-distillation baselines in areas like scientific reasoning, coding, and solving complex mathematical problems.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.05152" target="_blank">https://huggingface.co/papers/2606.05152</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233648551.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>36. Almieyar-Oryx-BloomBench: A Bilingual Multimodal Benchmark for Cognitively Informed Evaluation of Vision-Language Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Vision-Language Models, multimodal benchmark, cognitive asymmetry, cross-lingual multimodal reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce BloomBench, a cognitively grounded bilingual multimodal benchmark to reveal cognitive asymmetries and cross-lingual performance gaps in Vision-Language Models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilize Bloom&#8217;s Taxonomy to systematically evaluate six cognitive levels through image-question-answer tasks, employing a semi-automated pipeline and hybrid quality assurance protocol.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Identify strong performance in semantic understanding but weaknesses in factual recall and creative synthesis, highlighting cognitive asymmetries and performance gaps between languages in current models.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.05531" target="_blank">https://huggingface.co/papers/2606.05531</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233619451.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>37. SPACENUM: Revisiting Spatial Numerical Understanding in VLMs</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Vision-Language Models, Spatial Numerical Understanding, Coordinate-Aware Representations</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study revisits spatial numerical understanding of Vision-Language Models (VLMs) using the framework SpaceNum for evaluating map capabilities between spatial structures and numerical representations.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Formulated bidirectional tasks (Num2Space and Space2Num) to systematically study if VLMs understand numerical values in spatial settings.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Current VLMs fail to ground numerical values in spatial meaning, perform near random guesses, and continue to rely on shallow spatial cues instead of developing stable coordinate-aware representations. Explicit reasoning provides marginal improvements, but tuning can partially enhance spatial numerical understanding.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.23898" target="_blank">https://huggingface.co/papers/2605.23898</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233550536.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>38. Socratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent Skills</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Socratic-SWE, self-evolving software engineering agents, historical solving traces, closed-loop self-evolution framework, repair patterns</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To enhance LLM-driven software engineering agents using Socratic-SWE, which generates targeted repair tasks by leveraging historical solving traces.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implement a closed-loop self-evolution framework that distills historical solving traces into structured agent skills, guiding the generation of repair tasks.</p>
<p>   &#8211; Validate tasks through execution-based validation and solver-gradient alignment rewards.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Socratic-SWE improves self-evolution in software engineering agents across various benchmarks, achieving significant performance gains on constrained budgets.</p>
<p>   &#8211; The approach demonstrates that solving traces can be a scalable substrate for enhancing the capabilities of self-evolving SWE agents.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.07412" target="_blank">https://huggingface.co/papers/2606.07412</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233525430.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>39. Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: PhaseLock, Image-to-Video diffusion models, physical consistency, motion priors, Latent Delta Guidance</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To improve physical consistency in image-to-video diffusion models by preserving motion priors during the denoising process.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilized a training-free approach called PhaseLock to maintain motion priors from early-step inference throughout the denoising trajectory, using spectral analysis and Latent Delta Guidance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; PhaseLock effectively mitigates phase degradation, improving physical consistency by an average of 6.2 points across diverse models, while preserving visual fidelity with minimal computational overhead.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06361" target="_blank">https://huggingface.co/papers/2606.06361</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233456019.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>40. LLM Explainability with Counterfactual Chains and Causal Graphs</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Causal graphs, Large Language Models, concept discovery, counterfactual augmentation, concept-level explainability</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Knowledge Representation and Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to model Large Language Model inference processes using causal graphs to enhance transparency and explainability.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; A four-phase method involving concept discovery, mapping, and MCMC-inspired counterfactual augmentation is proposed to construct interpretable graphs, applied across various tasks including disease diagnosis and sentiment analysis.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The discovered causal graphs reflect meaningful dependencies aligned with LLMs&#8217; reasoning, supporting concept-level explainability of language models.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.05972" target="_blank">https://huggingface.co/papers/2606.05972</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233430556.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>41. Watch, Remember, Reason: Human-View Video Understanding with MLLMs</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Multimodal large language models, video understanding, perceptual representations, memory modeling, reasoning traces</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to transform video understanding through Multimodal Large Language Models (MLLMs) that handle complex video scenarios by focusing on watching, remembering, and reasoning capabilities.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study introduces a human-view perspective, organizing LLMs by their roles in video tasks: perceptual representation, memory states, and reasoning. Challenges are identified in areas such as spatio-temporal perception and memory modeling.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; It offers insights into the future of scalable, memory-aware video intelligence, emphasizing the development of unified models for comprehensive video analysis and the exploration of application domains such as sports and medical videos.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.07433" target="_blank">https://huggingface.co/papers/2606.07433</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233403835.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>42. UnpredictaBench: A Benchmark for Evaluating Distributional Randomness in LLMs</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: UnpredictaBench, Large Language Models, Simulation, Output Diversity, Distributional Sampling</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To evaluate the capacity of large language models (LLMs) in simulating target distributions and assessing the unpredictability of systems.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduction of UnpredictaBench, which includes 448 problems aimed at testing LLMs&#8217; ability to sample outcomes from individual target distributions, using the KS@N evaluation metric to quantify performance.</p>
<p>   &#8211; Utilization of the Kolmogorov-Smirnov test to measure how well LLMs&#8217; samples approximate target distributions.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Significant variations exist among models in their distributional capabilities, with scores indicating room for improvement in distributional sampling.</p>
<p>   &#8211; Current advancements in reasoning and output diversity have yet to provide a complete solution for accurate distributional simulations, highlighting ongoing challenges.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06622" target="_blank">https://huggingface.co/papers/2606.06622</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233337888.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>43. SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Persistent AI assistants, memory relations, long-term memory, relational memory, SubtleMemory</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Knowledge Representation and Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To evaluate AI agents&#8217; capacity to manage complex relational memory structures using the SubtleMemory benchmark, focusing on long-term memory relations.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduced SubtleMemory, a benchmark specifically designed to assess fine-grained relational memory discrimination in prolonged AI interactions, consisting of 1,522 evaluation instances and grounded in 1,090 memory-variant sets.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Existing memory systems exhibit limitations in discriminating fine-grained relational memory, and distinct capability profiles emerge across preservation, retrieval, and reasoning stages.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.05761" target="_blank">https://huggingface.co/papers/2606.05761</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233308722.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>44. When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: ToolMaze, Tool-Integrated Reasoning (TIR), implicit semantic failures, dynamic replanning, agentic fault-tolerance</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce ToolMaze, a benchmark designed for dynamic path discovery and error recovery in Tool-Integrated Reasoning (TIR) agents, addressing real-world tool failures.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Employ a two-dimensional design incorporating DAG-based topological complexity and a 2&#215;2 taxonomy of tool perturbations (explicit/implicit, transient/permanent) to evaluate performance under various conditions.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Real-world tool failures, especially implicit semantic ones, significantly degrade TIR performance, with an approximate 37% drop in Perturbation Recovery Rate; dynamic replanning emerges as a crucial bottleneck inadequately addressed by model scaling or prompting.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.05806" target="_blank">https://huggingface.co/papers/2606.05806</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233242321.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>45. Direct 3D-Aware Object Insertion via Decomposed Visual Proxies</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Object Insertion, Diffusion-based Methods, Pose Control, High-fidelity 2D Image Synthesis</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper introduces DIRECT, a novel framework designed to enable pose-controllable object insertion with high-fidelity 2D image synthesis.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The method utilizes decomposed guidance components comprising appearance guidance, geometry guidance, and context guidance to ensure accurate pose manipulation and visual detail integration.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed method, DIRECT, demonstrates superior performance in geometric controllability and visual quality compared to previous approaches, with the help of an automated data construction pipeline enhancing data diversity and quality.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06601" target="_blank">https://huggingface.co/papers/2606.06601</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260608233139982.mp4"></video> </figure>
</p>
</div>
<div style="height:30px"></div>
<h3>46. MMAE: A Massive Multitask Audio Editing Benchmark</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Instruction-based Audio Editing, MMAE, Multitask Audio Editing, Audio Modalities</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduction of MMAE as a comprehensive benchmark for instruction-based audio editing across various modalities and complexity levels.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The benchmark includes a taxonomy covering 7 audio modalities, 6 task complexity levels, and 8 operation types, based on a rubric-based evaluation framework.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Current models show significant gaps in capabilities, with an Exact Match Rate below 5% and 0% in complex tasks, indicating challenges in execution precision and robustness.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.07229" target="_blank">https://huggingface.co/papers/2606.07229</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260608233103037.mp4"></video> </figure>
</p>
</div>
<div style="height:30px"></div>
<h3>47. SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: SoCRATES, LLM mediators, socio-cognitive adaptation, consensus gap, multi-domain testbeds</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To present SoCRATES, a realistic benchmark for evaluating proactive LLM mediators across various socio-cognitive adaptation axes.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Construct scenarios from real conflicts using an agentic pipeline in eight domains; probe five socio-cognitive adaptation axes; evaluate using a topic-localized evaluator aligned with human experts.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Even top-performing Large Language Models (LLMs) resolve only about a third of the consensus gap; performance varies sharply by socio-cognitive axis, indicating need for better social adaptation.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.05563" target="_blank">https://huggingface.co/papers/2606.05563</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260608233029324.png"></figure>
</p>
</div>
<div style="height:30px"></div>
]]></content:encoded>
					
		
		<enclosure url="https://cdn.ainative.foundation/huggingface/20260608233602529.mp4" length="0" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/huggingface/20260608233902846.mp4" length="0" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/huggingface/20260608233804215.mp4" length="0" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/huggingface/20260608233103037.mp4" length="0" type="video/mp4" />

			</item>
		<item>
		<title>China AI Native Industry Insights &#8211; 20260608 &#8211;  Alibaba &#124; StepFun &#124; Tencent &#124; more</title>
		<link>https://ainativefoundation.org/china-ai-native-industry-insights-20260608-alibaba-stepfun-tencent-more/</link>
		
		<dc:creator><![CDATA[AINF]]></dc:creator>
		<pubDate>Mon, 08 Jun 2026 09:22:27 +0000</pubDate>
				<category><![CDATA[China Industry]]></category>
		<guid isPermaLink="false">https://ainativefoundation.org/china-ai-native-industry-insights-20260608-alibaba-%e9%98%b6%e8%b7%83%e6%98%9f%e8%be%b0-%e8%85%be%e8%ae%af%e6%b7%b7%e5%85%83-more/</guid>

					<description><![CDATA[Explore TongyiLab's AgentScope 2.0, ResNet's 2026 prize win, Tencent's Stem algorithm.]]></description>
										<content:encoded><![CDATA[<p>Explore TongyiLab&#8217;s AgentScope 2.0, ResNet&#8217;s 2026 prize win, Tencent&#8217;s Stem algorithm. Discover more in Today’s China AI Native Industry Insights.</p>
<h3>1.  Alibaba TongyiLab releases AgentScope 2.0 multi-agent framework with system-level transparency features</h3>
<p>Alibaba&#8217;s TongyiLab announced AgentScope 2.0, a production-ready agent framework focused on system-level transparency for multi-agent applications. The new version includes built-in retry and fallback mechanisms, an observable event system, a smarter permission system, and decoupled workspace with unified agent service. The framework is designed to work with increasingly capable LLMs while maintaining complete visibility into agent operations and decision-making processes.</p>
<p>Read more: <a href="https://github.com/agentscope-ai/agentscope">https://github.com/agentscope-ai/agentscope</a></p>
<p><video width="600" height="400" controls poster="https://cdn.ainative.foundation/image/20260608_5cf54fe4c7654b49a061c472ba4f7088.jpg"><source src="https://cdn.ainative.foundation/video/20260608_cn_alibaba.mp4" type="video/mp4"></video></p>
<p>Video Credit: @Ali_TongyiLab on X</p>
<h3>2.  ResNet paper co-authored by StepFun Chief Scientist Zhang Xiangyu receives CVPR 2026 Longuet-Higgins Prize</h3>
<p>CVPR 2026 announced that the 2015 paper Deep Residual Learning for Image Recognition (ResNet), co-authored by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, has received the Longuet-Higgins Prize. The prize recognizes research that has demonstrated long-term impact on both academic research and industrial development in computer vision. ResNet introduced residual learning to solve deep neural network training challenges and has become a foundational architecture in modern deep learning, with over 320,000 citations making it the most cited paper of the 21st century. The residual connection concept has expanded beyond computer vision to natural language processing, speech, multimodal systems, and other AI domains.</p>
<p>Read more: <a href="https://mp.weixin.qq.com/s/ZVgqdH_fE42jO4kcI-lF3g">https://mp.weixin.qq.com/s/ZVgqdH_fE42jO4kcI-lF3g</a></p>
<p><video width="600" height="400" controls poster="https://cdn.ainative.foundation/image/20260608_fd4220eacf1b48819be2314e33d3553c"><source src="https://cdn.ainative.foundation/video/20260608_cn_jyxc.mp4" type="video/mp4"></video></p>
<p>Video Credit: Hyperframes</p>
<h3>3.  Tencent Hunyuan proposes Stem sparse attention algorithm reducing time-to-first-token by 3.6x, paper accepted to ICML 2026</h3>
<p>Tencent Hunyuan announced the Stem sparse attention algorithm, which has been accepted to the machine learning conference ICML 2026. The algorithm introduces Token Position Decay (TPD) and Output-Aware Metric (OAM) innovations to achieve near-dense attention accuracy using only 25% of computational resources. When integrated into the Hunyuan Hy3 preview model with optimized HPC operators, Stem reduces time-to-first-token by 3.6x for 128K context lengths. The algorithm addresses bottlenecks in long-context inference by reallocating computational budget based on causal information flow and evaluating token importance through both attention scores and value vector magnitudes. Both the Stem algorithm and HPC operator implementations have been open-sourced on GitHub.</p>
<p>Read more: <a href="https://mp.weixin.qq.com/s/XneOSvjt-7A-DU546cGoZA">https://mp.weixin.qq.com/s/XneOSvjt-7A-DU546cGoZA</a></p>
<p><video width="600" height="400" controls poster="https://cdn.ainative.foundation/image/20260608_5cf4cfec76fb4901a2a227ed9b7881c3"><source src="https://cdn.ainative.foundation/video/20260608_cn_tx.mp4" type="video/mp4"></video></p>
<p>Video Credit: Hyperframes</p>
<div style="width:100%;height:2px;background:#808080;margin:10px 0"></div>
<p>That’s all for today’s China AI Native Industry Insights. Join us at <a href="https://member.ainativefoundation.org/">AI Native Foundation Membership Dashboard</a> for the latest insights on AI Native, or follow our linkedin account at <a href="https://www.linkedin.com/company/ainativefoundation/">AI Native Foundation</a> and our twitter account at <a href="https://x.com/AINativeF">AINativeF</a>.</p>
]]></content:encoded>
					
		
		<enclosure url="https://cdn.ainative.foundation/video/20260608_cn_alibaba.mp4" length="20363533" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/video/20260608_cn_jyxc.mp4" length="10960357" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/video/20260608_cn_tx.mp4" length="19954863" type="video/mp4" />

			</item>
		<item>
		<title>AI Native Product Insights &#8211; 2026W23</title>
		<link>https://ainativefoundation.org/ai-native-product-insights-2026w23/</link>
		
		<dc:creator><![CDATA[AINF]]></dc:creator>
		<pubDate>Mon, 08 Jun 2026 03:55:20 +0000</pubDate>
				<category><![CDATA[Products]]></category>
		<guid isPermaLink="false">https://ainativefoundation.org/ai-native-product-insights-2026w23/</guid>

					<description><![CDATA[Based on Product Hunt data, we've curated a selection of AI Native applications that demonstrate how AI is being built into the core of modern products. These AI Native solutions showcase new developments in functionality and are exploring fresh ways of human-AI interaction. Let's dive into these AI Native applications.]]></description>
										<content:encoded><![CDATA[<p>Based on Product Hunt data, we&#8217;ve curated a selection of AI Native applications that demonstrate how AI is being built into the core of modern products. These AI Native solutions showcase new developments in functionality and are exploring fresh ways of human-AI interaction. Let&#8217;s dive into these AI Native applications.</p>
<h3>1.  Astra Autonomous Pentest</h3>
<div> <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f3c5.png" alt="🏅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Product Hunt Data<br />
Ranking: 11<br />
Upvote: 416</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f680.png" alt="🚀" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Product Overview<br />
Astra Autonomous Pentest is an agentic security system that continuously runs offensive testing to discover chained vulnerabilities, validates findings to minimize false positives, and then generates code-ready remediation guidance as native prompts for developer tools like Cursor, Copilot, and Claude Code, turning pentesting into an always-on loop rather than a point-in-time report.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ca.png" alt="📊" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Evaluation<br />
AI Native Application Modernization: 89/100<br />
The product is strongly AI-native because multi-agent discovery, independent validation, and fix generation are the core workflow, not add-ons; it modernizes security operations by connecting detection to developer execution via IDE-native prompts, though real-world impact will depend on how well it fits into diverse SDLCs and governance requirements.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f517.png" alt="🔗" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Website<br />
https://www.getastra.com </p></div>
<p><img decoding="async" style="width:700px" src="https://ph-files.imgix.net/e03b6318-0d25-4000-9604-f6b980591b42.png"/></p>
<h3>2.  Ideogram 4.0</h3>
<div> <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f3c5.png" alt="🏅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Product Hunt Data<br />
Ranking: 29<br />
Upvote: 252</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f680.png" alt="🚀" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Product Overview<br />
Ideogram 4.0 is an open-weight text-to-image model built for production-grade visual generation, combining prompt-to-image creation with bounding-box layout control, reliable multilingual text rendering, and native 2K outputs for design workflows and developer integrations.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ca.png" alt="📊" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Evaluation<br />
AI Native Application Modernization: 86/100<br />
The core system is an AI model trained from scratch and exposed as a controllable generation engine, enabling modern app patterns like programmatic layout constraints, consistent typography in images, and high-resolution output; modernization is strong, with deployment and governance effort still required for enterprise adoption.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f517.png" alt="🔗" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Website<br />
https://ideogram.ai/models/4.0 </p></div>
<p><img decoding="async" style="width:700px" src="https://ph-files.imgix.net/6804dea9-8c36-4a7e-bafa-8380dce7631c.jpeg"/></p>
<h3>3.  Spectron</h3>
<div> <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f3c5.png" alt="🏅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Product Hunt Data<br />
Ranking: 45<br />
Upvote: 180</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f680.png" alt="🚀" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Product Overview<br />
Spectron is an AI-agent memory substrate that unifies vectors, graph, documents, and relational-style rows under a single ACID transaction model, so agent knowledge can be written, corrected, and retrieved without cross-store sync. It preserves provenance per fact, supports corrections that supersede rather than overwrite, and uses hybrid retrieval with trace-informed ranking to improve agent recall and grounding.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ca.png" alt="📊" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Evaluation<br />
AI Native Application Modernization: 86/100<br />
Spectron is strongly AI-native because memory, provenance, correction semantics, and hybrid retrieval are core primitives designed around agent workflows rather than a conventional database with add-on embeddings. The single transactional substrate reduces operational complexity and inconsistency risk, while tri-temporal facts and tenant scoping align with production agent requirements; the main adoption work is mapping existing data models and retrieval stacks onto its unified approach.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f517.png" alt="🔗" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Website<br />
https://surrealdb.com </p></div>
<p><img decoding="async" style="width:700px" src="https://ph-files.imgix.net/5eeb34df-4216-4148-b320-fd256fe0646c.jpeg"/></p>
<h3>4.  Nemotron 3 Ultra by NVIDIA</h3>
<div> <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f3c5.png" alt="🏅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Product Hunt Data<br />
Ranking: 49<br />
Upvote: 6</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f680.png" alt="🚀" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Product Overview<br />
Nemotron 3 Ultra is an open-weights, large-scale Mixture-of-Experts reasoning model designed to run long-horizon agent loops with high throughput and an unusually large context window, and it can be deployed via common model hubs and NVIDIA NIM as an inference microservice for production agent systems.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ca.png" alt="📊" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Evaluation<br />
AI Native Application Modernization: 89/100<br />
This is strongly AI-native because the model is the core execution layer for agent reasoning, with architecture and deployment packaging optimized for multi-step workflows, long context, and scalable serving; the main modernization gap is that teams still need surrounding orchestration, tool execution, and governance to turn raw model capability into end-to-end applications.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f517.png" alt="🔗" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Website<br />
http://www.nvidia.com </p></div>
<p><img decoding="async" style="width:700px" src="https://ph-files.imgix.net/244a8d27-5483-42fc-a9d7-aaf4f44d81c9.png"/></p>
<div style="width:100%;height:2px;background:#808080;margin:10px 0"></div>
<p>Statement: Evaluation results are generated by AI, lack of data support, reference learning only.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>AI Native Daily Paper Digest &#8211; 20260605</title>
		<link>https://ainativefoundation.org/ai-native-daily-paper-digest-20260605/</link>
		
		<dc:creator><![CDATA[insights]]></dc:creator>
		<pubDate>Sat, 06 Jun 2026 00:41:24 +0000</pubDate>
				<category><![CDATA[Papers]]></category>
		<guid isPermaLink="false">https://ainativefoundation.org/ai-native-daily-paper-digest-20260605/</guid>

					<description><![CDATA[1. Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution 🔑 Keywords: Code2LoRA, LoRA adapters, hypernetwork framework, GRU hidden state, repository-specific [&#8230;]]]></description>
										<content:encoded><![CDATA[<h3>1. Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Code2LoRA, LoRA adapters, hypernetwork framework, GRU hidden state, repository-specific </p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce Code2LoRA, a hypernetwork framework designed for generating repository-specific LoRA adapters to enhance code language models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Created RepoPeftBench, a benchmark with static and evolution tracks, to evaluate performance on Python repositories against parameter-efficient fine-tuning baselines.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Code2LoRA-Static and Code2LoRA-Evo achieved significant exact match rates in cross-repo and in-repo scenarios, demonstrating their effectiveness over existing LoRA fine-tuning methods.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06492" target="_blank">https://huggingface.co/papers/2606.06492</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260605233008664.mp4"></video> </figure>
</p>
</div>
<div style="height:30px"></div>
<h3>2. TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: TIDE, iterative discovery, thought templates</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to introduce TIDE, a framework for discovering hidden problems in context using templates and iterative methods.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; TIDE employs two mechanisms: iterative discovery for extending problem coverage and thought templates to anchor predictions within recognizable problem classes.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; TIDE demonstrates significant improvements in task coverage, problem identification, and resolution in personal workspaces and software repositories, surpassing traditional single-shot and parallel multi-agent approaches.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.04743" target="_blank">https://huggingface.co/papers/2606.04743</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233039273.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>3. VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: VideoKR, Knowledge-Intensive, Human-in-the-loop, Video Reasoning, Expert-Domain</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Knowledge Representation and Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce VideoKR, a pioneering large-scale training dataset focused on enhancing knowledge- and reasoning-intensive video understanding.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Developed a human-in-the-loop, skill-oriented example generation pipeline to cultivate progressively deeper video reasoning capabilities.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Models post-trained on VideoKR showcase superior performance on knowledge-intensive video reasoning tasks while maintaining competitiveness in general video reasoning, emphasizing data design&#8217;s pivotal role.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.05259" target="_blank">https://huggingface.co/papers/2606.05259</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233105470.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>4. RobotValues: Evaluating Household Robots When Human Values Conflict</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: RobotValues, value-conflict scenarios, household robots, default value preferences, VLMs</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Evaluate household robot planners in value-conflict scenarios using the RobotValues benchmark to test their ability to prioritize human values over task completion.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilization of LLM-assisted scenario generation, stakeholder-grounded value extraction, image generation, and automatic quality control to construct a benchmark with 10,000 scenarios.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Vision-language models (VLMs) used in robotics display default value preferences and often fail to prioritize specific conflicting values when instructed, making incorrect decisions 80% of the time.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.03312" target="_blank">https://huggingface.co/papers/2606.03312</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233131328.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>5. LoomVideo: Unifying Multimodal Inputs into Video Generation and Editing</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: LoomVideo, video generation, video editing, Multimodal Large Language Model, Scale-and-Add conditioning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce LoomVideo, a 5B-parameter efficient architecture for unified video generation and editing.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes a Multimodal Large Language Model and Deepstack injection for feature alignment.</p>
<p>   &#8211; Implements Scale-and-Add conditioning to significantly reduce computational cost.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; LoomVideo achieves state-of-the-art performance with superior efficiency and speed, particularly excelling in e-commerce and fashion generation scenarios.</p>
<p>   &#8211; The model provides a 5.41x acceleration in inference speed over similarly capable models.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06042" target="_blank">https://huggingface.co/papers/2606.06042</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233157880.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>6. Rethinking Continual Experience Internalization for Self-Evolving LLM Agents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Experience internalization, Continual learning, Large language models, Capability collapse, Internalization regime</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To investigate the mechanisms of experience internalization to enable continual learning in large language models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Systematic examination of three dimensions: experience granularity, experience injection pattern, and internalization regime.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Principle-level experience is more durable than instance-level experience for transferability.</p>
<p>   &#8211; Step-wise injection is superior to global injection for aligning experiences with decision states.</p>
<p>   &#8211; Off-policy context-distillation provides a more stable training signal than on-policy context-distillation for improving stability in experience internalization.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.04703" target="_blank">https://huggingface.co/papers/2606.04703</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233220938.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>7. The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: KITScenes Multimodal dataset, high-fidelity sensors, HD maps, embodied AI, geographic diversity</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to provide a comprehensive European driving dataset with high-fidelity sensors, including rich 3D maps and diverse urban environments, to advance embodied AI research. </p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes a fully synchronized sensor suite comprising high-resolution cameras, long-range lidar, 4D imaging radar, and GNSS/INS for precise localization. The dataset features complete HD maps validated through autonomous driving trials.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The KITScenes Multimodal dataset enriches existing datasets by offering unprecedented map completeness and geographic diversity, setting benchmarks in HD map construction, long-range depth estimation, novel view synthesis, and end-to-end driving.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.02956" target="_blank">https://huggingface.co/papers/2606.02956</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260605233251977.mp4"></video> </figure>
</p>
</div>
<div style="height:30px"></div>
<h3>8. LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: PropMe, memorization evaluation, SimpleTrace, propensity-aware framework, prefix-based capability attacks</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to evaluate language model memorization by differentiating between forced reproduction capabilities and natural propensity using the PropMe framework.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduces PropMe, a propensity-aware framework, and SimpleTrace, a lightweight tracing tool. Utilizes propensity-transformed metrics across open models and datasets, focusing on prefix-based capability attacks versus non-adversarial evaluations.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The study concludes that large language models can reveal training data when directly elicited, but do so less frequently under non-adversarial circumstances. It also highlights the importance of assessing both worst-case extractability and ordinary leakage propensity for a comprehensive view of memorization capabilities.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06286" target="_blank">https://huggingface.co/papers/2606.06286</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233318213.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>9. MAOAM: Unified Object and Material Selection with Vision-Language Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: unified vision-language model, object selection, material selection, interactive image editing, MAOAM</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce MAOAM, a unified selection framework to enhance object and material selection via text and click interactions, supporting diverse editing workflows with improved robustness.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes a vision-language model (VLM) with a segmentation head to create pixel-accurate masks from user-defined prompts, aiming at both object and material selection.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Demonstrated accurate and coherent selection capabilities in diverse scenarios, improving image editing workflows by integrating text and click-based interactions effectively.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.04880" target="_blank">https://huggingface.co/papers/2606.04880</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233341108.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>10. Meta-Cognitive Memory Policy Optimization for Long-Horizon LLM Agents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Memory-augmented language models, Belief Entropy, Metacognitive Memory Policy Optimization, long-horizon tasks, epistemic uncertainty</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to improve performance in memory-augmented language models tackling long-horizon tasks by focusing on memory quality instead of solely outcome success.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study introduces Belief Entropy as a self-supervised proxy to assess uncertainty about latent task states.</p>
<p>   &#8211; Metacognitive Memory Policy Optimization (MMPO) is proposed to provide memory-specific supervision and penalize summaries increasing epistemic uncertainty, departing from traditional outcome-based signals.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Experiments demonstrate that MMPO consistently outperforms existing methods in diverse long-horizon tasks, maintaining high performance even in large contexts.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.30159" target="_blank">https://huggingface.co/papers/2605.30159</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233436250.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>11. World-Language-Action Model for Unified World Modeling, Language Reasoning, and Action Synthesis</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: World-language-action models, Autoregressive Transformer, Long-horizon task execution, Cross-embodiment learning, State-of-the-art</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective is to develop world-language-action (WLA) models, integrating textual instructions, images, and robot state predictions to efficiently execute long-horizon tasks and enhance cross-embodiment learning.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; An autoregressive (AR) Transformer backbone is used to predict future states by combining semantic-level textual intentions with fine-grained physical dynamics, enabled by a World Expert for supervising physical dynamics and meta-queries for world prediction impacting action generation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The WLA-0 prototype, with 2B active parameters, demonstrates state-of-the-art capabilities in multi-task and long-horizon learning in simulated and real-world environments, achieving notable success rates on RoboTwin2.0 Clean and RMBench tasks, and shows potential for learning novel tasks from cross-embodiment robot videos without action annotations.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.05979" target="_blank">https://huggingface.co/papers/2606.05979</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><video controls="true" autoplay="true" muted="true" width="600" src="https://cdn.ainative.foundation/huggingface/20260605233405955.mp4"></video> </figure>
</p>
</div>
<div style="height:30px"></div>
<h3>12. AffordanceVLA: A Vision-Language-Action Model Empowering Action Generation through Affordance-Aware Understanding</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AffordanceVLA, Vision-Language-Action, perception-action mapping, affordance forecasting, robotic manipulation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; AffordanceVLA aims to establish a precise perception-action mapping by integrating structured affordance forecasting with Vision-Language models to enhance robotic manipulation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The framework utilizes a Mixture-of-Transformer architecture with specialized experts, combined with a three-stage training strategy and automated data augmentation to tackle data scarcity issues in robotic datasets.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; AffordanceVLA demonstrates strong performance in diverse manipulation scenarios through its spatially grounded and semantically conditioned affordance cues, bridging the gap between vision, language, and action.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06155" target="_blank">https://huggingface.co/papers/2606.06155</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233532562.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>13. MLEvolve: A Self-Evolving Framework for Automated Machine Learning Algorithm Discovery</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: MLEvolve, LLM-based, multi-agent framework, machine learning algorithm discovery, self-evolving</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce MLEvolve, an LLM-based self-evolving multi-agent framework designed for machine learning algorithm discovery to overcome existing limitations in long-horizon tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes Progressive MCGS to enhance search mechanisms with graph-based reference edges and an entropy-inspired progressive schedule.</p>
<p>   &#8211; Implements Retrospective Memory for dynamic knowledge retrieval and reuse to facilitate agent evolution.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; MLEvolve exhibits state-of-the-art performance on MLE-Bench across various metrics and outperforms methods like AlphaEvolve in mathematical algorithm optimization tasks, showcasing strong cross-domain generalization.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06473" target="_blank">https://huggingface.co/papers/2606.06473</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233512539.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>14. SePO: Self-Evolving Prompt Agent for System Prompt Optimization</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Self-Evolving Prompt Optimization, evolutionary search, task agents, prompt optimization, fine-tuning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective is to enhance AI agent performance by jointly optimizing both task and prompt agent system prompts using a novel method called Self-Evolving Prompt Optimization (SePO).</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study employs an evolutionary search strategy with a self-referential design, using a single prompt agent to improve both task agents’ system prompts and its own system prompt.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Self-Evolving Prompt Optimization significantly outperforms existing methods across five different benchmarks, demonstrating an average accuracy improvement of 4.49 points over the Manual-CoT technique. The optimization skill gained from pre-training efficiently generalizes across various tasks beyond the pre-training mixture.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.04465" target="_blank">https://huggingface.co/papers/2606.04465</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233557479.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>15. SEAOTTER: Sensor Embedded Autoencoding with One-Time Transcode for Efficient Reconstruction</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Cloud Robotics, Learned Latent, JPEG Compatibility, Asymmetric Autoencoders, SEAOTTER</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study presents a compression framework for cloud robotics that merges learned latent representations with standard JPEG compatibility to enhance encoding and decoding speed while maintaining high perceptual quality.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduction of the SEAOTTER framework, which pairs a Sensor Embedded Autoencoder with a One-Time Transcode for Efficient Reconstruction, utilizing a learnable JPEG color and quantization transform to improve accuracy in various perception tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed framework achieves significant improvements over AVIF, with 7 times faster encoding, 3.5 times faster decoding, and +8% ImageNet top-1 accuracy, all while preserving compatibility with existing JPEG infrastructure.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.03940" target="_blank">https://huggingface.co/papers/2606.03940</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233623442.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>16. Regret Minimization with Adaptive Opponents in Repeated Games</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Repeated Policy Regret, adaptive opponents, game-theoretic, non-convex optimization</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study focuses on minimizing regret in repeated games with adaptive opponents by introducing a new metric: Repeated Policy Regret.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Identification of necessary conditions to achieve sublinear RP-Regret.</p>
<p>   &#8211; Proposal of three algorithms designed for minimizing non-convex RP-Regret.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Minimizing RP-Regret leads to finding better equilibria and more cooperative solutions in repeated games like Stag-Hunt.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06486" target="_blank">https://huggingface.co/papers/2606.06486</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233733997.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>17. Latent Reasoning with Normalizing Flows</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Latent reasoning, normalizing flows, chain-of-thought, probabilistic sampling, KV-cache decoding</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to enhance large language models&#8217; reasoning capabilities by integrating latent reasoning through normalizing flows without losing the benefits of autoregressive generation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The framework, called NF-CoT, uses normalizing flows to conduct intermediate computations in continuous states, maintaining compatibility with left-to-right generation and probabilistic sampling.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; NF-CoT has been shown to improve code-generation pass rates and reduce reasoning costs compared to explicit chain-of-thought methods and previous latent reasoning frameworks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06447" target="_blank">https://huggingface.co/papers/2606.06447</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233711895.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>18. MechVQA: Benchmarking and Enhancing Multimodal LLMs on Comprehensive Mechanical Drawing Understanding</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Mechanical engineering drawing, Multimodal Large Language Models, MechVQA, High-density visual question answering, Domain knowledge</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Computer Vision</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To enhance understanding of mechanical engineering drawings using a specialized dataset and domain-specific model.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduction of MechVQA dataset containing 3.3k images and 21k question-answer pairs.</p>
<p>   &#8211; Development of MechVL model through a multi-stage training paradigm.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; MechVL model outperforms existing baselines by 7.57 percentage points, providing a reusable foundation for deploying MLLMs in mechanical design and inspection.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.30794" target="_blank">https://huggingface.co/papers/2605.30794</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233648830.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>19. LLM Anonymization Against Agentic Re-Identification</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: LLM-powered, anonymization, re-identification, contextual utility, adaptive privacy scope</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Ethics and Fairness</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Develop AURA, a framework to balance privacy protection and utility retention in text anonymization using LLM-powered methods.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilization of adaptive privacy scopes and mask-reconstruct methods evaluated against re-identification attacks from web-search agents and utility based on real-user interviews.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; AURA enhances the privacy-utility frontier by improving resistance to agentic re-identification while preserving contextual utility through adaptive privacy and mask-reconstruct techniques.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.30848" target="_blank">https://huggingface.co/papers/2605.30848</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233842948.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>20. Video2LoRA: Parametric Video Internalization for Vision-Language Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Video2LoRA, Low-Rank Adaptation, video processing, vision-language models, inference cost</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The main objective is to enhance video processing efficiency in vision-language models by predicting Low-Rank Adaptation weights, thereby reducing computational costs while retaining video-faithful outputs.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The introduction of Video2LoRA method involves a perceiver hypernetwork that reads intermediate representations from a frozen Vision-Language Model (VLM) to generate Low-Rank Adaptation adapters in a single forward pass.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Video2LoRA achieves equivalent performance to direct video-in-context inference, reducing answer-time visual-token load and query TTFT significantly while being stable for extensive frame and pixel ranges.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.04351" target="_blank">https://huggingface.co/papers/2606.04351</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233819353.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>21. Discrete-WAM: Unified Discrete Vision-Action Token Editing for World-Policy Learning</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Discrete-WAM, Autonomous Driving, Causal Reasoning, Discrete Tokens, Discrete Diffusion Framework</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce Discrete-WAM, a unified discrete latent vision-action world policy for autonomous driving, enabling compositional causal reasoning and counterfactual reasoning.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilize aligned discrete tokens and a shared discrete diffusion framework for compositional generalization across diverse driving scenarios.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Discrete-WAM achieves competitive performance on large-scale autonomous-driving benchmarks, supporting controllable generation and offering a principled path to more reliable decision-making.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.05645" target="_blank">https://huggingface.co/papers/2606.05645</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233757858.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>22. Trust Region Q Adjoint Matching</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Off-policy reinforcement learning, Trust Region Q-Adjoint Matching, pretrained flow policies, projected dual descent, model collapse</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective is to address the instability in off-policy reinforcement learning by introducing Trust Region Q-Adjoint Matching (TRQAM) to ensure stable fine-tuning of pretrained flow policies.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; TRQAM uses projected dual descent to adaptively control the path-space KL divergence, and optimizes the trust-region parameter (λ) in stochastic optimal control dynamics to stabilize the learning process.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Experiments on 50 OGBench tasks demonstrate that TRQAM consistently outperforms existing approaches in offline RL and offline-to-online RL, achieving a 68% success rate compared to a strong baseline of 46%.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.27079" target="_blank">https://huggingface.co/papers/2605.27079</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233932142.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>23. Combinatorial Synthesis: Scaling Code RLVR via Atomic Decomposition and Recombination</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Atomic Decomposition, Recombination, Verifiable Code Tasks, Reinforcement Learning, Large Language Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To propose the Atomic Decomposition and Recombination (ADR) framework for generating novel and challenging verifiable code tasks to enhance the scalability of Reinforcement Learning with Verifiable Rewards (RLVR) in Large Language Models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The ADR framework decomposes tasks into atomic elements and performs controlled recombination to produce new tasks, surpassing the limitations of previous heuristic approaches for data synthesis.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; ADR demonstrates excellent originality, difficulty, diversity, and test quality over existing methods and improves coding performance across RLVR in various domains such as algorithmic programming, tool usage, and data science.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2605.31058" target="_blank">https://huggingface.co/papers/2605.31058</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233908763.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>24. Quality-Guided Semi-Supervised Learning for Medical Image Segmentation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Semi-supervised learning, pseudolabels, segmentation quality, quality predictor, medical image segmentation</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI in Healthcare</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop a quality-guided semi-supervised learning framework that enhances medical image segmentation by improving pseudolabel reliability and segmentation performance.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduction of a dedicated quality predictor trained on variable-quality masks from synthetic corruptions and partially trained segmentation models.</p>
<p>   &#8211; Integration of the quality predictor into SSL through quality-aware regularization loss and quality-based pseudolabel sample reweighting.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Proposed method consistently improves over existing semi-supervised learning methods in medical image segmentation, validated through extensive experiments across five datasets and multiple architectures.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.01753" target="_blank">https://huggingface.co/papers/2606.01753</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605234023039.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>25. ForeSci: Evaluating LLM Agents for Forward-Looking AI Research Judgment</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: LLM agents, forward-looking research, ForeSci, decision-making systems, AI domains</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduction of ForeSci, a benchmark aimed at evaluating the ability of LLM agents to make forward-looking research decisions based on historical evidence in AI domains.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Creation of tasks derived from pre-cutoff taxonomy branches and use of specific answer-generation backbones preceding task cutoffs to enhance accuracy and traceability.</p>
<p>   &#8211; Evaluation of native LLMs, Hybrid RAG, and research-agent adaptations across various backbones to test evidence organization and decision-making capabilities.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Evidence organization improves traceability and factual support.</p>
<p>   &#8211; The effectiveness of evidence organization depends significantly on the decision family, with a noted challenge of evidence-decision decoupling affecting research judgements.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.00644" target="_blank">https://huggingface.co/papers/2606.00644</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233956980.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>26. </h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="" target="_blank"></a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/thirteen/202606051780702843.jpg"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>27. SABER: Benchmarking Operational Safety of LLM Coding Agents in Stateful Project Workspaces</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Large Language Models, Coding Agents, Environment-Aware Operational Safety, Safety Profiles, Harmful Safety-Violation Rate</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce SABER, a benchmark for evaluating the safety of large language models as coding agents in realistic project environments.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; SABER places models in realistic agent-style projects to assess safety based on the action sequence&#8217;s final environment state rather than binary prompt refusals.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Even the best large language models have over a 54% harmful safety-violation rate, revealing current alignment deficiencies and distinct safety profiles across models.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.01317" target="_blank">https://huggingface.co/papers/2606.01317</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605234033691.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>28. BRepCLIP: Contrastive Multimodal Pretraining on BRep Primitives for CAD Understanding</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: BRepCLIP, Multimodal Representation Learning, Contrastive Pretraining, CAD Models, Boundary Representations</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Introduce BRepCLIP, a framework for aligning boundary representation (BRep) geometry of CAD models with language and image embeddings using contrastive pretraining.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Model CAD objects as sequences of face and edge tokens with discrete vocabularies for surface and curve geometry, augmented with spatial and semantic descriptors. Use a transformer encoder to aggregate these into a global BRep embedding aligned with CLIP&#8217;s text and image encoders.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; BRepCLIP achieves superior retrieval and classification performance over point-based methods, showing significant improvements in retrieval and classification scores across various datasets and proving effective as a CAD-aware similarity metric.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.05515" target="_blank">https://huggingface.co/papers/2606.05515</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605234009383.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>29. Is This Edit Correct? A Multi-Dimensional Benchmark for Reasoning-Aware Image Editing</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: RE-Edit, image editing systems, reasoning dimensions, logical consistency, Diffusion-based image editing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Knowledge Representation and Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To introduce RE-Edit, a benchmark for reasoning-aware image editing that evaluates systems across five reasoning dimensions: physical, environmental, cultural, causal, and referential.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Development of a benchmark comprising 1,000 curated samples, each designed to test the logical consistency of image editing systems beyond visual plausibility.</p>
<p>   &#8211; Evaluation of ten open-source and two commercial image editing models using dimension-aligned criteria for fine-grained analysis.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Finding that even advanced image editing systems struggle with multi-dimensional reasoning despite high-quality visuals.</p>
<p>   &#8211; Introduction of a lightweight reasoning-guided post-edit baseline, demonstrating the potential of explicit reasoning to improve model performance in a model-agnostic manner.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.05172" target="_blank">https://huggingface.co/papers/2606.05172</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233943243.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>30. Multimodal Music Recommendation System using LLMs</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: multimodal framework, session-based music recommendation, LLM-based sequential reasoning, audio and lyric embeddings, cross-modal integration</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to enhance music recommendation accuracy by integrating audio, lyric, and semantic signals using a multimodal framework that employs LLM-based sequential reasoning.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study adopts a multimodal framework to enrich the LastFM-1K dataset with audio and lyric embeddings, LLM-generated semantic metadata, and listening completion ratios.</p>
<p>   &#8211; The research leverages E4SRec, extending it with various item ID encoder backbones and LLM backbones, including SASRec, BERT4Rec, GRU4Rec, LLaMa-2-13B, Qwen2.5-7B-Instruct, and LLaMa-3-70B.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The integration of content-based features significantly improves recommendation accuracy, demonstrating up to 95% improvement in Recall and 79% in NDCG.</p>
<p>   &#8211; The study highlights challenges in cross-modal integration, noting that naive multimodal fusion does not always yield additive improvements.</p>
<p>   &#8211; A large-scale multimodal benchmark for music recommendation is released.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.00125" target="_blank">https://huggingface.co/papers/2606.00125</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233920082.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>31. The Shape of Addition: Geometric Structures of Arithmetic in Large Language Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Large Language Models, Arithmetic Fragility, Geometric Structures, Noisy Quantization Model, Geometric Slippages</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Analyze the geometric structures causing arithmetic fragility in Large Language Models (LLMs) and propose a new model to address these issues.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduce and utilize the Noisy Quantization Model to explain arithmetic errors in LLMs and employ geometric frameworks to detect and correct quantization failures during inference.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Identified and explained the Iso-Raw-Sum Trajectory (IRST) as a key structure in arithmetic fragility, validating the insights through geometric consistency checks that successfully detect and correct arithmetic errors.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.03645" target="_blank">https://huggingface.co/papers/2606.03645</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233858173.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>32. Absorbing Complexity: An Interaction-Native Knowledge Harness for Financial LLM Agents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Financial AI agents, InKH, knowledge management, temporal memory, AI Native</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI in Finance</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study introduces the Interaction-native Knowledge Harness (InKH) architecture to embed complexity within financial AI agents, reducing the need for users to manage this complexity.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilized a controlled synthetic benchmark with 24 random seeds, 4 rounds, and 80 episodes per round, comparing InKH against 6 baselines in 46,080 evaluations.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; InKH significantly reduces latency, token cost, and stale-knowledge usage, while improving task quality and traceability, demonstrating that system-absorbed complexity enhances financial AI agent efficacy.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.01886" target="_blank">https://huggingface.co/papers/2606.01886</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233832935.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>33. Benchmark Everything Everywhere All at Once</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Automated benchmark creation, LLMs, scalability, domain-specific reasoning, Benchmark Agent</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Develop an automated system, Benchmark Agent, for creating diverse evaluation datasets to facilitate continuous model assessment.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The framework encompasses user query analysis, subtask design, data annotation, and quality control. Evaluations include human assessments and consistency checks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Benchmark Agent produces high-quality benchmarks with minimal human intervention and highlights weaknesses in current models, particularly in domain-specific reasoning tasks.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06462" target="_blank">https://huggingface.co/papers/2606.06462</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233808714.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>34. AURA: Intent-Directed Probing for Implicit-Need Surfacing in Situated LLM Agents</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Intent Inference, Implicit Needs, Gap Scoring, Tool Usage, Probe Consumption</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Knowledge Representation and Reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Enhance query answering by estimating implicit needs and optimizing tool usage through an intent inference step.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Incorporate an inference step that produces an IntentFrame to estimate implicit needs.</p>
<p>   &#8211; Utilize gap scoring to control per-query probe budget and tool selection.</p>
<p>   &#8211; Benchmark with a 100-query four-scene implicit-intent dataset.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; AURA achieves improved implicit-need coverage compared to standard approaches, with significant gains across multiple scenes.</p>
<p>   &#8211; The approach reduces probe consumption and maintains privacy compliance on factual lookups.</p>
<p>   &#8211; The improvement is attributed to gap calibration rather than answer memorization.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.05557" target="_blank">https://huggingface.co/papers/2606.05557</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233745435.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>35. EvoDS: Self-Evolving Autonomous Data Science Agent with Skill Learning and Context Management</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: EvoDS, Autonomous Skill Acquisition, Adaptive Context Compression, Reinforcement Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: AI Systems and Tools</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to enhance automated data science capabilities by introducing a self-evolving autonomous data science agent, EvoDS, which utilizes skill acquisition and adaptive context management.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Implemented two strategies: Autonomous Skill Acquisition (ASA) and Adaptive Context Compression (ACC), within a two-stage multi-agent training scheme, leveraging reinforcement learning principles to improve context management and skill synthesis.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; EvoDS demonstrates a significant performance improvement, outperforming state-of-the-art data science agents by an average of 28.9% across various benchmarks and eliminating out-of-token failures.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.03841" target="_blank">https://huggingface.co/papers/2606.03841</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233722985.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>36. Revising Context, Shifting Simulated Stance: Auditing LLM-Based Stance Simulation in Online Discussions</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: LLM-based stance simulation, context sensitivity, counterfactual context revision, multimodal approaches</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Investigate how Large Language Models (LLMs) simulate social media user stances, focusing on context sensitivity in counterfactual scenarios.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Applied controlled revision strategies to both text-only and multimodal conversational contexts to simulate stance changes.</p>
<p>   &#8211; Evaluated the effectiveness of these strategies using metrics such as average directional stance shift and stance transition rate.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Both text-only and multimodal approaches exhibit robust stance transitions, highlighting the importance and complexity of context sensitivity in LLM-based stance simulations.</p>
<p>   &#8211; The study provides a framework for understanding these simulations, bringing attention to both the potential and risks of using LLMs for simulating opinion dynamics online.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06443" target="_blank">https://huggingface.co/papers/2606.06443</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233700337.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>37. AdaCodec: A Predictive Visual Code for Video MLLMs</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AdaCodec, Video MLLMs, Visual Tokens, Inter-frame Changes, Predictive Visual Code</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper introduces AdaCodec, a system designed to reduce redundancy in video encoding by selectively transmitting full visual tokens only when scene prediction fails.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; AdaCodec operates by first sending a full reference frame when the scene cannot be predicted well from prior context, otherwise it encodes compact inter-frame changes.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; AdaCodec significantly outperforms a baseline model by improving efficiency in visual token usage and reducing time-to-first-token from 9.26s to 1.62s across multiple benchmarks, even at reduced visual-token budgets.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.02569" target="_blank">https://huggingface.co/papers/2606.02569</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233637271.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>38. Flash-WAM: Modality-Aware Distillation for World Action Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Flash-WAM, Modality-aware, World-action models, Real-time inference, Consistency function</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study introduces Flash-WAM, a modality-aware step-distillation framework to enhance World-action models (WAMs) for real-time inference by addressing inconsistencies across noise regimes in video and action streams.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The paper employs a step-distillation process inspired by consistency distillation, utilizing different parametrization techniques (linear-gradient-scaling and variance-preserving) to optimize video and action streams&#8217; noise conditions.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Flash-WAM significantly reduces latency and maintains high task success rates in simulation benchmarks, enabling real-time inference on platforms like RoboTwin 2.0 and improving performance on real-world tasks compared to naive consistency distillation.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.05254" target="_blank">https://huggingface.co/papers/2606.05254</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233612510.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>39. Towards Truly Multilingual ASR: Generalizing Code-Switching ASR to Unseen Language Pairs</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Automatic Speech Recognition, code-switching ASR, synthetic CS speech generation, model merging, domain generalization</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To investigate the generalization capabilities of code-switching ASR models across unseen language pairs using model merging and domain generalization methods.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study employed model merging and domain generalization techniques to test the transferability of bilingual code-switching capabilities to new language pairs.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Merged bilingual CS-ASR models showed modest generalization to unseen language pairs, indicating limited transferability of bilingual CS capabilities.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.05846" target="_blank">https://huggingface.co/papers/2606.05846</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233543689.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>40. Learning Geometric Representations from Videos for Spatial Intelligent Multimodal Large Language Models</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: GeoVR, 3D awareness, geometric knowledge distillation, semantic latent space, spatial intelligence</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; GeoVR aims to enhance multimodal large language models (MLLMs) by introducing 3D awareness through a novel framework that restructures their semantic latent space using geometric knowledge from 3D foundation models.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The framework utilizes a multi-objective learning strategy employing four geometric targets, including camera pose estimation, dense depth map regression, metric scale factor prediction, and multi-scale 3D feature distillation, to develop strong 3D understanding.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; GeoVR, through extensive experiments on spatial reasoning benchmarks, demonstrates state-of-the-art performance and establishes a new paradigm in endowing foundation models with spatial intelligence.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.05833" target="_blank">https://huggingface.co/papers/2606.05833</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233523030.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>41. Towards One-to-Many Temporal Grounding</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: One-to-Many Temporal Grounding, Temporal Grounding, Count Accuracy, Effective Temporal F1, Chain-of-Thought reasoning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper aims to address the challenge of One-to-Many Temporal Grounding by introducing a comprehensive benchmark, novel reward functions, and improved policy optimization.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Establishes the first comprehensive benchmark for One-to-Many Temporal Grounding (OMTG) with new evaluation metrics like Count Accuracy and Effective Temporal F1.</p>
<p>   &#8211; Develops a high-quality OMTG dataset with 56k samples and novel temporal and caption reward functions utilizing Chain-of-Thought reasoning for policy optimization.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed model achieves a new state-of-the-art Effective Temporal F1 of 43.65% on the OMTG benchmark, outperforming previous models by significant margins.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06294" target="_blank">https://huggingface.co/papers/2606.06294</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233457763.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>42. Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Future-L1, Video event prediction, Latent visual reasoning, Autoregressive decoding, State-of-the-art</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Multi-Modal Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Future-L1 is designed to improve video event prediction by maintaining visual semantics in latent space during autoregressive decoding, aiming for enhanced prediction accuracy.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The Future-L1 framework alternates between language tokens and continuous latent visual spans.</p>
<p>   &#8211; Constructs Future-L1-50K dataset and employs LA-DAPO, a latent-aware RL objective with outcome-contrastive and temporal-diversity rewards, to optimize latent trajectories.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Future-L1 achieves state-of-the-art results, significantly improving benchmark scores on FutureBench and TwiFF-Bench, demonstrating the benefits of preserving visual semantics in latent space rather than converting all reasoning into text.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.05769" target="_blank">https://huggingface.co/papers/2606.05769</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233425008.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>43. The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Inference-time scaling, Large Language Models, constrained optimization, economic principles, global shadow price</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to enhance performance in resource-constrained environments by improving inference-time scaling for Large Language Models through a novel economic principle-based optimization strategy.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The paper formulates inference budget allocation as a global constrained optimization problem, employing economic principles and modeling per-query reasoning utility with a shifted-surge function to derive an optimal allocation policy.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The proposed Constrained Latent-utility Equilibrium Allocation for Reasoning (CLEAR) strategy reallocates resources to solvable queries, significantly improving the Pareto frontier of token cost versus mean accuracy. In resource-scarce regimes, CLEAR enhances global accuracy up to 3 times compared to uniform allocation.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.03092" target="_blank">https://huggingface.co/papers/2606.03092</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233353258.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>44. OPRD: On-Policy Representation Distillation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: On-Policy Representation Distillation, hidden-state space, sampling variance, training efficiency</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; Improve the traditional on-policy distillation by aligning student and teacher representations in the hidden-state space to reduce variance and improve training efficiency.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduce On-Policy Representation Distillation (OPRD) that aligns student and teacher representations in hidden-state space, bypassing the LM head and providing richer per-layer structural information.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; OPRD closes the student-teacher gap effectively, trains 1.44x faster, and uses 54% less memory than existing top-k OPD methods.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06021" target="_blank">https://huggingface.co/papers/2606.06021</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233329337.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>45. Unsupervised Skill Discovery for Agentic Data Analysis</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: DataCOPE, data-analytic agent, skill discovery, Adaptive Checklist Verifier, Answer Agreement Verifier</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Machine Learning</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The main objective of the research is to develop DataCOPE, an unsupervised framework that discovers reusable data-analysis skills to improve the performance in report-style and reasoning-style tasks.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; DataCOPE utilizes verifier-guided exploration, deriving verifier signals from exploration trajectories to evaluate quality and agreement. It involves a Data-Analytic Agent for trajectory generation, an Unsupervised Verifier for signal extraction, and a Skill Manager for contrastive skill distillation. The framework is instantiated with an Adaptive Checklist Verifier for report-style analysis and an Answer Agreement Verifier for reasoning-style analysis.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; DataCOPE consistently enhances performance over baseline models, achieving an average improvement in mean score by 9.71% for report-style tasks and 32.30% for reasoning-style tasks across various model settings.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06416" target="_blank">https://huggingface.co/papers/2606.06416</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233306048.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>46. Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Video generation models, robotic manipulation, physics simulator, trajectory fidelity, execution success</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Robotics and Autonomous Systems</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To evaluate video generation models on their ability to reflect physical reality through robotic manipulation tasks, and assess if visual quality predicts executable motion accuracy.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduce Dream.exe, an evaluation framework with a video-to-execution pipeline to assess the execution ability of videos generated for robotic tasks in a physics simulator.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Several models achieved measurable execution success, indicating that generative priors learned from large datasets may contain meaningful physical knowledge, although visual quality does not reliably predict executability.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.04811" target="_blank">https://huggingface.co/papers/2606.04811</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233235403.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>47. Complexity-Balanced Diffusion Splitting</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Complexity-Balanced Splitting, generative capacity, temporal capacity allocation, diffusion timeline, synthesis quality</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Generative Models</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The objective is to improve synthesis quality in generative models without increasing inference costs by introducing Complexity-Balanced Splitting (CBS), which allocates generative capacity across specialized sub-networks based on local complexity measures.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; CBS divides the diffusion timeline into segments with equal approximation burden using two monitor functions: a spatial measure based on Dirichlet energy and a geometric measure based on sampling trajectories&#8217; acceleration. A lightweight auxiliary model estimates local complexity profiles to optimize temporal partitioning.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; CBS consistently enhances synthesis quality across various architectures and datasets, achieving a ~35% improvement in FID on SiT-XL with CFG compared to naive temporal partitioning, without additional inference cost.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06477" target="_blank">https://huggingface.co/papers/2606.06477</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233208661.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>48. Personal AI Agent for Camera Roll VQA</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Conversational AI, Visual Question Answering, Hierarchical Memory, Personal Camera Roll</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Human-AI Interaction</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; To develop a Conversational AI agent for answering visual questions using a personal camera roll with hierarchical memory and specialized tools.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Creation and manual annotation of a dataset named camroll containing 50 users, 31,476 images, and 2,500 QA pairs to support real-world usage.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The camroll-agent demonstrates superior performance in long-context understanding compared to existing baselines, indicating the need for different approaches in AI agents for personalized visual memory versus textual memory.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.05275" target="_blank">https://huggingface.co/papers/2606.05275</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233145658.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>49. Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Reinforcement learning, Large language models, Zero-shot transfer, Meta-skill, Linguistic context</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The research aims to enhance the ability of large language models to translate unseen languages by utilizing in-context linguistic knowledge instead of memorizing specific languages.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; The study employs a reinforcement learning approach using a surface-level translation metric, chrF, as the reward to train models on unseen language translation.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; The reinforcement learning-trained models successfully leverage linguistic context, outperforming in-context learning and supervised fine-tuning, and potentially apply RL to tasks beyond traditional reasoning such as language learning from context.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.06428" target="_blank">https://huggingface.co/papers/2606.06428</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233120155.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>50. AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: AdaPlanBench, adaptive planning, Large Language Model, dual constraints, interactive benchmark</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The paper introduces AdaPlanBench, a benchmark for evaluating Large Language Models (LLMs) on their capacity to adaptively plan and re-plan under progressively revealed world and user constraints.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Utilizes a set of 307 household tasks and a scalable constraint construction pipeline to augment each task with dual constraints.</p>
<p>   &#8211; LLM agents interact through a multi-turn protocol where constraints are revealed only when violated, requiring iterative plan revisions.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; Experiments with ten leading LLMs indicate that adaptive planning under dual constraints is challenging, with a maximum accuracy of 67.75%.</p>
<p>   &#8211; Performance is particularly degraded by user constraints and is affected by weaker physical grounding and reduced effectiveness.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.05622" target="_blank">https://huggingface.co/papers/2606.05622</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233051357.png"></figure>
</p>
</div>
<div style="height:30px"></div>
<h3>51. ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?</h3>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f511.png" alt="🔑" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Keywords: Role-playing language agents, ArcANE, Character Arc, Narrative Evaluation, psychological trajectory</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4a1.png" alt="💡" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Category: Natural Language Processing</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f31f.png" alt="🌟" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Objective:</p>
<p>   &#8211; The study aims to develop and evaluate benchmarks for Role-playing language agents that focus on dynamic character development and psychological trajectory alignment rather than static factual recall.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f6e0.png" alt="🛠" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Methods:</p>
<p>   &#8211; Introduced ArcANE, a benchmark evaluating character development across 17 novels and 80 principal characters, using an approach that segments narratives into phases along a psychological axis to test model responses.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f4ac.png" alt="💬" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Research Conclusions:</p>
<p>   &#8211; ArcANE-conditioned models demonstrate superior performance compared to other context strategies, especially in scenarios unexplored in source texts, with fine-tuned models (ArcANE-8B/32B) further enhancing performance outside source text contexts.</p>
</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f449.png" alt="👉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Paper link:&nbsp;<a href="https://huggingface.co/papers/2606.05553" target="_blank">https://huggingface.co/papers/2606.05553</a></p>
<div class="wp-block-image">
<figure class="aligncenter"><img loading="lazy" decoding="async" width="660" height="660" src="https://cdn.ainative.foundation/huggingface/20260605233022416.png"></figure>
</p>
</div>
<div style="height:30px"></div>
]]></content:encoded>
					
		
		<enclosure url="https://cdn.ainative.foundation/huggingface/20260605233008664.mp4" length="0" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/huggingface/20260605233251977.mp4" length="0" type="video/mp4" />
<enclosure url="https://cdn.ainative.foundation/huggingface/20260605233405955.mp4" length="0" type="video/mp4" />

			</item>
		<item>
		<title>AI Native Weekly Newsletter: 06 June 2026</title>
		<link>https://ainativefoundation.org/ai-native-weekly-newsletter-06-june-2026/</link>
		
		<dc:creator><![CDATA[AINF]]></dc:creator>
		<pubDate>Fri, 05 Jun 2026 18:12:06 +0000</pubDate>
				<category><![CDATA[Newsletter]]></category>
		<guid isPermaLink="false">https://ainativefoundation.org/ai-native-weekly-newsletter-06-june-2026/</guid>

					<description><![CDATA[This week shows how quickly AI is moving from frontier research into tools, devices, and physical systems people can actually use. Anthropic’s latest analysis puts recursive self-improvement back in focus, with Claude now writing more than 80% of the code merged into its production systems. That progress is already reaching users: OpenAI extended Codex with Sites, turning plain-language prompts into interactive websites and apps, while Google released Gemma 4 12B, a multimodal model small enough to run locally on a laptop. On Windows, Microsoft’s OpenClaw and NVIDIA RTX Spark point to a new layer of local-agent infrastructure, combining secure execution, enterprise control, and hardware purpose-built for running AI agents on personal machines. And with Unitree’s H2 Plus, the story moves into robotics, as AI begins to step out of software workflows and into machines that move.]]></description>
										<content:encoded><![CDATA[<p><head></p>
<style>
        body {
            font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif;
            line-height: 1.6;
            color: #333;
            max-width: 1200px;
            margin: 0 auto;
            padding: 20px;
        }</p>
<p>        .read-time {
            color: #666;
            font-size: 0.9em;
            margin-bottom: 20px;
        }</p>
<p>        h1 {
            font-size: 2.8em;
            color: #000;
            margin-bottom: 10px;
            font-weight: normal;
        }</p>
<p>        .author-date {
            color: #666;
            font-size: 0.9em;
            margin-bottom: 30px;
        }</p>
<p>        .intro {
            font-size: 1.1em;
            margin: 30px 0;
            color: #333;
        }</p>
<p>        .contents {
            margin: 40px 0;
        }</p>
<p>        .contents h2 {
            color: #000;
            font-size: 2em;
            margin-bottom: 20px;
        }</p>
<p>        .contents ul {
            list-style: disc;
            padding: 0;
            padding-left: 20px;
        }</p>
<p>        .contents li {
            margin: 10px 0;
            padding-left: 10px;
        }</p>
<p>        .contents a {
            color: #0066cc;
            text-decoration: none;
        }</p>
<p>        .contents a:hover {
            text-decoration: underline;
        }</p>
<p>        .education-section {
            margin: 40px 0;
        }</p>
<p>        .education-section h2 {
            font-size: 2.5em;
            color: #000;
            margin-bottom: 30px;
        }</p>
<p>        .promo-banner {
            background: #FFE4B5;
            border-radius: 8px;
            overflow: hidden;
            margin-bottom: 30px;
        }</p>
<p>        .banner-image {
            width: 100%;
            height: auto;
            display: block;
        }</p>
<p>        .education-points {
            margin-top: 30px;
        }</p>
<p>        .education-points ul {
            list-style: none;
            padding: 0;
        }</p>
<p>        .education-points li {
            margin: 15px 0;
            font-size: 1.1em;
        }</p>
<p>        .education-points strong {
            color: #000;
        }</p>
<p>        .cta-button {
            display: inline-block;
            background: #000;
            padding: 12px 30px;
            border-radius: 25px;
            margin-top: 20px;
        }</p>
<p>        .cta-button a {
            color: #fff!important;
            text-decoration: none;
            font-weight: bold;
        }</p>
<p>        .cta-button:hover {
            background: #333;
        }</p>
<p>        .cta-button:hover a {
            text-decoration: none;
        }</p>
<p>        .kubecon-section {
            margin: 60px 0;
        }</p>
<p>        .kubecon-section h2 {
            font-size: 2.5em;
            color: #000;
            margin-bottom: 30px;
        }</p>
<p>        .event-banner {
            margin-bottom: 30px;
            border-radius: 8px;
            overflow: hidden;
            max-width: 800px;
            margin-left: 0;
            margin-right: 0;
        }</p>
<p>        .event-graphics {
            padding: 0;
        }</p>
<p>        .event-image {
            width: 100%;
            height: auto;
            display: block;
            border-radius: 8px;
        }</p>
<p>        .event-info {
            padding: 20px;
            background: #fff;
            text-align: center;
        }</p>
<p>        .event-details h3 {
            color: #6f42c1;
            font-size: 1.5em;
            margin: 15px 0;
        }</p>
<p>        .hashtags {
            color: #6f42c1;
            font-size: 1.1em;
        }</p>
<p>        .schedule-button {
            display: inline-block;
            background: linear-gradient(90deg, #42c1b3, #6f42c1);
            color: #fff;
            padding: 10px 30px;
            border-radius: 4px;
            font-weight: bold;
            margin-top: 15px;
        }</p>
<p>        .event-description {
            font-size: 1.1em;
            line-height: 1.6;
        }</p>
<p>        .event-description .link {
            color: #0098d4;
            text-decoration: none;
        }</p>
<p>        .event-description .link:hover {
            text-decoration: underline;
        }</p>
<p>        .register-button {
            display: inline-block;
            background: rgb(0,145,255);
            padding: 12px 18px;
            border-radius: 25px;
            margin-top: 20px;
        }</p>
<p>        .register-button a {
            color: #fff!important;
            text-decoration: none;
            font-weight: bold;
        }</p>
<p>        .register-button:hover {
            background: rgb(0,145,255);
        }</p>
<p>        .section-divider {
            border: 0;
            border-top: 1px solid #eee;
            margin: 40px 0;
            width: 100%;
        }
    </style>
<p></head></p>
<p><body></p>
<div class="intro">
       This week shows how quickly AI is moving from frontier research into tools, devices, and physical systems people can actually use. Anthropic’s latest analysis puts recursive self-improvement back in focus, with Claude now writing more than 80% of the code merged into its production systems. That progress is already reaching users: OpenAI extended Codex with Sites, turning plain-language prompts into interactive websites and apps, while Google released Gemma 4 12B, a multimodal model small enough to run locally on a laptop. On Windows, Microsoft’s OpenClaw and NVIDIA RTX Spark point to a new layer of local-agent infrastructure, combining secure execution, enterprise control, and hardware purpose-built for running AI agents on personal machines. And with Unitree’s H2 Plus, the story moves into robotics, as AI begins to step out of software workflows and into machines that move.
    </div>
<div class="contents">
<h2>Contents</h2>
<ul>
            <!-- 

<li><a href="#education">New features of our membership</a></li>

 --></p>
<li>
<a   href="#news5426400">Anthropic Says Claude’s 80% Code Share Points to AI Self-Improvement<br />
</a>
</li>
<li>
<a   href="#news3743628">OpenAI Launches Sites for Codex to Build Interactive Apps<br />
</a>
</li>
<li>
<a   href="#news1590707">Google Releases Gemma 4 12B for Local Multimodal AI<br />
</a>
</li>
<li>
<a   href="#news1287421">Microsoft Brings OpenClaw to Windows with Security Containers<br />
</a>
</li>
<li>
<a   href="#news383802">Microsoft and NVIDIA Launch RTX Spark for Windows AI PCs<br />
</a>
</li>
<li>
<a   href="#news2772">Unitree Announces H2 Plus Humanoid Robot on NVIDIA Isaac GR00T<br />
</a>
</li>
</ul></div>
<p>    <!-- 

<div class="education-section" id="education">
        

<h2>New features of our membership</h2>


        

<div class="promo-banner">
            <img decoding="async" src="https://cdn.ainative.foundation/uploads/44ba8180-21ff-466a-902f-f35f707126c3.jpeg"
                alt="Linux Foundation Education Promotion - Strive to Thrive in '25" class="banner-image">

        </div>



        

<div class="education-points">
            

<ul>
                

<li><strong>Unlock New Possibilities with Our Latest Features!</strong></li>


                

<li><strong>New Workflow Module: </strong> Enhance your productivity with our newly added workflow module in the membership system.</li>


                

<li><strong>Exclusive Access:</strong> Explore featured and premium automated workflows designed to optimize efficiency.</li>


                

<li><strong>Join & Explore:</strong> Register today to discover powerful solutions that streamline your work and boost performance!</li>


            </ul>


            

<p style="display: inline-block; background: #000; padding: 12px 30px; border-radius: 25px; margin-top: 20px;"><a href="https://member.ainativefoundation.org/aiflow/selection" style="color: #fff!important; text-decoration: none; font-weight: bold;">REGISTER NOW</a></p>


        </div>


    </div>



    

<hr class="section-divider"> --><br />
<!-- 列表文章 --></p>
<div class="kubecon-section" id="news5426400">
<h2>Anthropic Says Claude’s 80% Code Share Points to AI Self-Improvement</h2>
<div class="event-banner">
<div class="event-graphics">
            <img decoding="async" src="https://cdn.ainative.foundation/20260605_img_en_claude.png"
                alt="KubeCon + CloudNativeCon Europe 2025 London" class="event-image">
        </div>
</p></div>
<div class="event-description">
<p>Anthropic said Claude now authors more than 80% of the code merged into the company’s production systems as of May 2026, up from single digits before February 2025. The company says this trend points to a possible path toward recursive self-improvement, where AI systems could eventually help build more capable successors. Anthropic engineers now ship about 8 times as much code per quarter compared with 2021–2025, with Claude handling coding tasks that previously took humans hours to complete. The company warns that AI systems accelerating their own development could move faster than many institutions are prepared to manage.</p>
<p ><a href="https://www.anthropic.com/institute/recursive-self-improvement" style="color: #15c!important; text-decoration: none; font-weight: bold;border-bottom:2px solid;">Read More ⟶</a></p>
</p></div>
</div>
<hr class="section-divider"><!-- 列表文章 --></p>
<div class="kubecon-section" id="news3743628">
<h2>OpenAI Launches Sites for Codex to Build Interactive Apps</h2>
<div class="event-banner">
<div class="event-graphics">
            <img decoding="async" src="https://cdn.ainative.foundation/image/20260604_3b25f25cea954fe0a45420bcee0eec27.jpg"
                alt="KubeCon + CloudNativeCon Europe 2025 London" class="event-image">
        </div>
</p></div>
<div class="event-description">
<p>OpenAI announced Sites, a new Codex feature that allows users to create, share, and host interactive websites and apps through natural language prompts. Sites is rolling out in preview to Business and Enterprise customers first. Users can transform ideas, analysis, and plans into dashboards, planners, review workspaces, project boards, and other lightweight tools that can be shared with workspace members via URL. The feature expands Codex from a coding assistant into a tool for creating interactive workspaces and usable software experiences.</p>
<p ><a href="https://openai.com/index/codex-for-every-role-tool-workflow/" style="color: #15c!important; text-decoration: none; font-weight: bold;border-bottom:2px solid;">Read More ⟶</a></p>
</p></div>
</div>
<hr class="section-divider"><!-- 列表文章 --></p>
<div class="kubecon-section" id="news1590707">
<h2>Google Releases Gemma 4 12B for Local Multimodal AI</h2>
<div class="event-banner">
<div class="event-graphics">
            <img decoding="async" src="https://cdn.ainative.foundation/image/20260604_ff9be6d034e94c7fa754bd4932d0cb43.jpg"
                alt="KubeCon + CloudNativeCon Europe 2025 London" class="event-image">
        </div>
</p></div>
<div class="event-description">
<p>Google DeepMind released Gemma 4 12B, a 12-billion-parameter multimodal model designed to bring agentic multimodal intelligence directly to laptops. The model processes text, images, and audio with a unified architecture, without separate multimodal encoders. Gemma 4 12B can run locally with 16 GB of VRAM or unified memory and is released under an Apache 2.0 license. It is also the first mid-sized Gemma model to support native audio input, while delivering benchmark performance close to Google’s larger 26B model with a smaller memory footprint.</p>
<p ><a href="https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12b/" style="color: #15c!important; text-decoration: none; font-weight: bold;border-bottom:2px solid;">Read More ⟶</a></p>
</p></div>
</div>
<hr class="section-divider"><!-- 列表文章 --></p>
<div class="kubecon-section" id="news1287421">
<h2>Microsoft Brings OpenClaw to Windows with Security Containers</h2>
<div class="event-banner">
<div class="event-graphics">
            <img decoding="async" src="https://cdn.ainative.foundation/image/20260604_759b90bc6567479ba887ea7225c87a76.jpg"
                alt="KubeCon + CloudNativeCon Europe 2025 London" class="event-image">
        </div>
</p></div>
<div class="event-description">
<p>Microsoft announced that OpenClaw now runs securely on Windows using Microsoft Execution Containers technology. The integration allows OpenClaw’s node and gateway to run with policy-driven containment for enterprise and local agent deployments. Microsoft also introduced a Windows companion app for setting up new Claws or connecting to existing ones. Together with Agent 365, the update is designed to help enterprises observe, govern, and secure AI agents on Windows.</p>
<p ><a href="https://blogs.windows.com/windowsdeveloper/2026/06/02/windows-platform-security-for-ai-agents/" style="color: #15c!important; text-decoration: none; font-weight: bold;border-bottom:2px solid;">Read More ⟶</a></p>
</p></div>
</div>
<hr class="section-divider"><!-- 列表文章 --></p>
<div class="kubecon-section" id="news383802">
<h2>Microsoft and NVIDIA Launch RTX Spark for Windows AI PCs</h2>
<div class="event-banner">
<div class="event-graphics">
            <img decoding="async" src="https://cdn.ainative.foundation/image/20260602_gj_img_microsoft.png"
                alt="KubeCon + CloudNativeCon Europe 2025 London" class="event-image">
        </div>
</p></div>
<div class="event-description">
<p>Microsoft and NVIDIA announced NVIDIA RTX Spark, a platform designed to power highly efficient thin-and-light Windows PCs for developers, creators, and power users. RTX Spark systems are purpose-built for the new wave of AI agents, with up to 128 GB of unified memory and integrated CPU/GPU/memory architecture optimized for local agentic workloads. The platform will support NVIDIA OpenShell on Windows, built on new Windows security and containment features for running agents locally. RTX Spark-powered laptops and small form factor desktop PCs from Microsoft Surface, ASUS, Dell, HP, Lenovo, MSI, and other OEMs are expected to launch beginning this fall.</p>
<p ><a href="https://blogs.windows.com/windowsexperience/2026/05/31/introducing-a-powerful-new-chapter-for-windows-pcs-accelerated-by-nvidia-rtx-spark/" style="color: #15c!important; text-decoration: none; font-weight: bold;border-bottom:2px solid;">Read More ⟶</a></p>
</p></div>
</div>
<hr class="section-divider"><!-- 列表文章 --></p>
<div class="kubecon-section" id="news2772">
<h2>Unitree Announces H2 Plus Humanoid Robot on NVIDIA Isaac GR00T</h2>
<div class="event-banner">
<div class="event-graphics">
            <img decoding="async" src="https://cdn.ainative.foundation/image/20260603_e262cc6f1e5d40afb9579771608cc1cb.jpg"
                alt="KubeCon + CloudNativeCon Europe 2025 London" class="event-image">
        </div>
</p></div>
<div class="event-description">
<p>Unitree announced H2 Plus, a humanoid robot platform built on NVIDIA Isaac GR00T. The system combines Unitree’s H2 humanoid body, dual SharpaWave tactile five-finger hands, NVIDIA Jetson Thor onboard compute, and Isaac GR00T open software. It comes with open models, simulation frameworks, and validated workflows from data to deployment. The platform is designed to support humanoid reasoning, learning, multitask behavior, and real-time onboard robot inference and control.</p>
<p ><a href="https://www.unitree.com/H2plus" style="color: #15c!important; text-decoration: none; font-weight: bold;border-bottom:2px solid;">Read More ⟶</a></p>
</p></div>
</div>
<hr class="section-divider">
<p></body></p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
