TL;DR RIVA proposes a two-agent setup for infrastructure verification that stays reliable even when observability tools return wrong or empty outputs. The key idea is cross-validation: require multiple independent diagnostic paths before concluding “drift” (or “no drift”). On the AIOpsLab benchmark, RIVA improves accuracy versus a baseline ReAct-style agent, especially under simulated tool failures. What…
openAI
A Systematic Study of LLM-Based Architectures for Automated Patching
TL;DR This study compares four LLM-based automated patching architectures on the same benchmark of 19 real-world Java vulnerabilities (AIxCC). The headline result reported: general-purpose code agents (specifically Claude Code) patched 16/19, outperforming more patch-specific workflows in this setup. The authors argue architecture + iteration depth can matter as much as (or more than) raw model…
OpenAI Agrees with Dept. of War to Deploy Models in Their Classified Network
TL;DR OpenAI reportedly agreed to deploy models on classified U.S. military networks—an inflection point for how frontier AI capabilities move into high-stakes government environments. What this is about A public statement (and ensuing discussion) indicates OpenAI is moving toward deployments in classified networks. The conversation also contrasts different lab stances on defense and classified deployments….