TL;DR Andrej Karpathy highlighted “Claws” as a distinct layer above LLM agents—systems that make agents messaging-native and operationally useful. He pointed to OpenClaw as an example of that category. What this is about In a short post, Karpathy frames a “Claw” as an architectural layer that sits on top of an agent runtime and connects…
Agents
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
TL;DR CUDA Agent applies large-scale, multi-turn agentic reinforcement learning to teach models to write and iteratively optimize CUDA kernels—aiming for performance that competes with strong compiler baselines. What this is about The paper proposes an “agentic RL” training approach where a model works in a sandbox: profile, write kernels, compile, run, observe performance, and refine…
Parallel Coding Agents with tmux and Markdown Specs
TL;DR A practical, production-tested way to run multiple AI coding agents in parallel: use tmux to split “PM / Planner / Worker” roles, and use a lightweight Markdown “Feature Design (FD)” spec as the handoff artifact so agents don’t step on each other. What this is about Manuel Schipper describes a workflow for managing 4–8…
WebMCP Is Available for Early Preview
Draft TL;DR Chrome’s WebMCP is an early proposal to make websites “agent-ready” by letting publishers expose structured actions (tools) that AI agents can invoke—rather than relying on brittle DOM clicking and scraping. What this is about Today’s web agents succeed or fail based on UI guesswork: finding the right button, filling the right field, surviving…
MCP Is Dead. Long Live the CLI.
Draft TL;DR A contrarian take: instead of adopting MCP everywhere, treat the CLI as the universal tool interface for agents—it’s debuggable, composable, and already “native” to how LLMs learned to operate. What this is about The post argues MCP adds a new layer of complexity (servers, lifecycle, transport logs, auth wrappers) for many tasks that…
RF-Agent: Automated Reward Function Design via Language Agent Tree Search
Draft TL;DR RF-Agent treats reward-function design as a search problem: an LLM proposes reward code, runs RL training to score it, and uses Monte Carlo Tree Search (MCTS) to decide what to try next. The key idea is to reuse the entire history of attempts and feedback—rather than sampling one-off reward candidates. What this is…
Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks
Draft TL;DR HGPO is a training tweak for long-horizon LLM agents: it fixes a subtle bias in stepwise group-based RL by ensuring steps are compared under consistent historical context. It does this by building a hierarchy of groups that share increasing amounts of history, then combining their advantage estimates. What this is about Group-based RL…