Assessing Deanonymization Risks with Stylometry-Assisted LLM Agent

TL;DR

The paper studies how tool-augmented LLM agents can deanonymize authors of anonymized text using stylometry—raising practical risks for whistleblowers, journalists, and researchers relying on pseudonymity.

What this is about

The authors introduce a stylometry-assisted agent approach (SALA) that combines quantitative stylometric features (lexical/syntactic/readability/semantic signals) with LLM reasoning to narrow down likely authors, and they analyze how effective it can be under different conditions.

Key points

Tool augmentation matters: combining feature extraction with LLM reasoning can outperform “pure prompting” approaches and reduce hallucination-prone guesswork.
Candidate retrieval: a database warm-up step can substantially increase the chance the true author is in the candidate set (as reported by the paper).
Dual-use: the same pipeline can be repurposed from attribution to targeted anonymization/re-writing strategies.

Why it matters

As LLM agents get better at coordinating tools, risks shift from “the model guesses” to “the model runs an analysis pipeline.” Stylometry has long been a deanonymization vector; agentic tooling can make it cheaper, faster, and more accessible—changing the threat model for anyone publishing sensitive writing under a pseudonym.

Practical takeaways

If anonymity is critical, assume an adversary can run automated stylometric analysis—don’t rely on “light” rewriting.
Organizations hosting anonymous submissions should consider threat-model guidance and mitigations (e.g., standardized writing templates, editorial rewriting, strict metadata controls).
Researchers should treat stylometry + agents as a first-class safety topic in LLM deployments.

Caveats / what to watch

Results depend heavily on the size/quality of the candidate author corpus and the amount of text available.
“Anonymization” is not a binary; defenses often degrade under adaptive attacks.