HF Papers

Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting

Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting

2026-06-09

Phase Marginalization for Patch-Grid Instability in Vision Transformers

Phase Marginalization for Patch-Grid Instability in Vision Transformers

2026-06-09

AsyncWebRL: Efficient Multi-Step RL for Visual Web Agents

AsyncWebRL: Efficient Multi-Step RL for Visual Web Agents

2026-06-09

Trust Functions: Near-Lossless Weak-to-Strong Generalization by Learning When to Trust the Weak Teacher

Trust Functions: Near-Lossless Weak-to-Strong Generalization by Learning When to Trust the Weak Teacher

2026-06-09

WorldCraft: From Camera Navigation to Object Manipulation in Interactive Video World Models

WorldCraft: From Camera Navigation to Object Manipulation in Interactive Video World Models

2026-06-09

Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory

Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory

2026-06-09

Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops

Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops

2026-06-09

OASIS: From Simulation Data Collection to Real-World Humanoid Loco-Manipulation

OASIS: From Simulation Data Collection to Real-World Humanoid Loco-Manipulation

2026-06-09

EmpiriGraph-Psy: A Dataset and LLM Pipeline for Extracting Empirical Relation Graphs from Psychology Abstracts

EmpiriGraph-Psy: A Dataset and LLM Pipeline for Extracting Empirical Relation Graphs from Psychology Abstracts

2026-06-09

Chiaroscuro Attention: Spending Compute in the Dark

Chiaroscuro Attention: Spending Compute in the Dark

2026-06-09

Where Rectified Flows Leak: Characterising Membership Signals Along the Interpolation Path

Where Rectified Flows Leak: Characterising Membership Signals Along the Interpolation Path

2026-06-09

SigmaScale: LLM Compression with SVD-based Low-Rank Decomposition and Learned Scaling Matrices

SigmaScale: LLM Compression with SVD-based Low-Rank Decomposition and Learned Scaling Matrices

2026-06-09

Cosine Misleads: Auxiliary Losses Reshape Vision Language Models, Not Their Latents

Cosine Misleads: Auxiliary Losses Reshape Vision Language Models, Not Their Latents

2026-06-09

Self-Evaluation Is Already There: Eliciting Latent Judge Calibration in Base LLMs with Minimal Data

Self-Evaluation Is Already There: Eliciting Latent Judge Calibration in Base LLMs with Minimal Data

2026-06-09

Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory

Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory

2026-06-09

Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense

Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense

2026-06-09

Precision Is Not Faithfulness: Coverage-Aware Evaluation of Grounded Generation with a Complete Oracle

Precision Is Not Faithfulness: Coverage-Aware Evaluation of Grounded Generation with a Complete Oracle

2026-06-09

Pruning and Distilling Mixture-of-Experts into Dense Language Models

Pruning and Distilling Mixture-of-Experts into Dense Language Models

2026-06-09

Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings

Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings

2026-06-08

SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations

SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations

2026-06-08