HF Papers

The Role of Feedback Alignment in Self-Distillation

The Role of Feedback Alignment in Self-Distillation

2026-06-10

When the Chain of Thought Knows Better: Failure Modes in Multi-Turn Reasoning Models

When the Chain of Thought Knows Better: Failure Modes in Multi-Turn Reasoning Models

2026-06-10

Decentralized Multi-Agent Systems with Shared Context

Decentralized Multi-Agent Systems with Shared Context

2026-06-10

PaperMentor: A Human-Centered Multi-Agent Writing Tutor for AI Research Papers on Overleaf

PaperMentor: A Human-Centered Multi-Agent Writing Tutor for AI Research Papers on Overleaf

2026-06-10

When Behavioral Safety Evaluation Fails: A Representation-Level Perspective

When Behavioral Safety Evaluation Fails: A Representation-Level Perspective

2026-06-10

FadeMem: Distance-Aware Memory Consolidation for Autoregressive Video Diffusion

FadeMem: Distance-Aware Memory Consolidation for Autoregressive Video Diffusion

2026-06-10

BenSyc: Benchmarking Conversational Sycophancy and Human Alignment in LLMs for Bengali Contexts

BenSyc: Benchmarking Conversational Sycophancy and Human Alignment in LLMs for Bengali Contexts

2026-06-10

In-Context Multiple Instance Learning

In-Context Multiple Instance Learning

2026-06-10

SWE-Explore: Benchmarking How Coding Agents Explore Repositories

SWE-Explore: Benchmarking How Coding Agents Explore Repositories

2026-06-09

Agents' Last Exam

Agents’ Last Exam

2026-06-09

On the Geometry of On-Policy Distillation

On the Geometry of On-Policy Distillation

2026-06-09

LatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM Agents

LatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM Agents

2026-06-09

Latent Spatial Memory for Video World Models

Latent Spatial Memory for Video World Models

2026-06-09

CoVEBench: Can Video Editing Models Handle Complex Instructions?

CoVEBench: Can Video Editing Models Handle Complex Instructions?

2026-06-09

FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention

FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention

2026-06-09

SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks

SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks

2026-06-09

Human Psychometric Questionnaires Mischaracterize LLM Behavior

Human Psychometric Questionnaires Mischaracterize LLM Behavior

2026-06-09

Echo-Memory: A Controlled Study of Memory in Action World Models

Echo-Memory: A Controlled Study of Memory in Action World Models

2026-06-09

End-to-End Context Compression at Scale

End-to-End Context Compression at Scale

2026-06-09

OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

2026-06-09