Skip to content

Digest AI

Menu
Menu

RIVA: Leveraging LLM Agents for Reliable Configuration Drift Detection

Posted on March 5, 2026March 5, 2026 by DigestAI

TL;DR

  • RIVA proposes a two-agent setup for infrastructure verification that stays reliable even when observability tools return wrong or empty outputs.
  • The key idea is cross-validation: require multiple independent diagnostic paths before concluding “drift” (or “no drift”).
  • On the AIOpsLab benchmark, RIVA improves accuracy versus a baseline ReAct-style agent, especially under simulated tool failures.

What this is about

Infrastructure-as-code (IaC) makes provisioning repeatable, but real systems drift: manual tweaks, upgrades, or mistakes push reality away from the spec. LLM agents can help analyze telemetry and verify config state—but only if they can handle a common production failure mode: tools that look successful but return misleading results (timeouts, empty strings, stale data).

Key points

  • Two specialized agents: a Verifier Agent decides what should be checked; a Tool Generation Agent runs diverse tool calls and logs results in a shared history.
  • Cross-validation threshold: the verifier only concludes after K distinct diagnostic paths for a property (K=2 in evaluation), reducing over-reliance on one flaky tool.
  • Tool call history matters: contradictions across tool outputs are surfaced instead of silently accepted.
  • Results: on AIOpsLab, RIVA improves task accuracy compared to a baseline agent, with a larger lift when tools are intentionally made unreliable.

Why it matters

If you deploy agents that call external systems (APIs, CLIs, dashboards), tool unreliability isn’t an edge case—it’s normal. RIVA’s design is a reminder that “agent safety” isn’t only about model behavior; it’s also about system reliability under partial failure.

Practical takeaways

  • Design agent workflows to verify via multiple independent signals (two tools, two endpoints, logs + metrics, etc.).
  • Log tool calls and results in a structured way so contradictions are detectable.
  • Separate “tool execution” from “final judgment” to reduce brittle, single-pass reasoning failures.

Caveats / what to watch

  • Evaluation is benchmark-based; generalization to other IaC stacks and real org tooling remains to be validated.
  • Even cross-validation can fail if all tools share the same hidden dependency (e.g., the same source of stale truth).

Links

  • arXiv: RIVA (abs)
  • PDF
Category: Agents, LLM, openAI, Programming

Post navigation

← MA-CoNav: A Master-Slave Multi-Agent Framework with Hierarchical Collaboration and Dual-Level Reflection for Long-Horizon Embodied VLN
RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization →

Categories

  • Agents (17)
  • Claude (4)
  • CUDA (1)
  • LLM (17)
  • MCP (2)
  • openAI (3)
  • openClaw (4)
  • Programming (8)
  • Uncategorized (1)

Recent Post

  • RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization
  • RIVA: Leveraging LLM Agents for Reliable Configuration Drift Detection
  • MA-CoNav: A Master-Slave Multi-Agent Framework with Hierarchical Collaboration and Dual-Level Reflection for Long-Horizon Embodied VLN
  • An AI Agent Published a Hit Piece on Me – The Operator Came Forward
  • CARE: Towards Clinical Accountability in Multi-Modal Medical Reasoning with an Evidence-Grounded Agentic Framework

Archives

  • March 2026

Categories

  • Agents
  • Claude
  • CUDA
  • LLM
  • MCP
  • openAI
  • openClaw
  • Programming
  • Uncategorized
© 2026 Digest AI | Powered by Minimalist Blog WordPress Theme