Skip to content

Digest AI

Menu
Menu

Month: March 2026

Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks

Posted on March 2, 2026March 2, 2026 by DigestAI

Draft TL;DR HGPO is a training tweak for long-horizon LLM agents: it fixes a subtle bias in stepwise group-based RL by ensuring steps are compared under consistent historical context. It does this by building a hierarchy of groups that share increasing amounts of history, then combining their advantage estimates. What this is about Group-based RL…

Read more
  • Previous
  • 1
  • 2
  • 3

Categories

  • Agents (17)
  • Claude (4)
  • CUDA (1)
  • LLM (17)
  • MCP (2)
  • openAI (3)
  • openClaw (4)
  • Programming (8)
  • Uncategorized (1)

Recent Post

  • RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization
  • RIVA: Leveraging LLM Agents for Reliable Configuration Drift Detection
  • MA-CoNav: A Master-Slave Multi-Agent Framework with Hierarchical Collaboration and Dual-Level Reflection for Long-Horizon Embodied VLN
  • An AI Agent Published a Hit Piece on Me – The Operator Came Forward
  • CARE: Towards Clinical Accountability in Multi-Modal Medical Reasoning with an Evidence-Grounded Agentic Framework

Archives

  • March 2026

Categories

  • Agents
  • Claude
  • CUDA
  • LLM
  • MCP
  • openAI
  • openClaw
  • Programming
  • Uncategorized
© 2026 Digest AI | Powered by Minimalist Blog WordPress Theme