March, 2026 - Digest AI

Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks

Posted on March 2, 2026March 2, 2026 by DigestAI

Draft TL;DR HGPO is a training tweak for long-horizon LLM agents: it fixes a subtle bias in stepwise group-based RL by ensuring steps are compared under consistent historical context. It does this by building a hierarchy of groups that share increasing amounts of history, then combining their advantage estimates. What this is about Group-based RL…