TL;DR
The paper studies how tool-augmented LLM agents can deanonymize authors of anonymized text using stylometry—raising practical risks for whistleblowers, journalists, and researchers relying on pseudonymity.
What this is about
The authors introduce a stylometry-assisted agent approach (SALA) that combines quantitative stylometric features (lexical/syntactic/readability/semantic signals) with LLM reasoning to narrow down likely authors, and they analyze how effective it can be under different conditions.
Key points
- Tool augmentation matters: combining feature extraction with LLM reasoning can outperform “pure prompting” approaches and reduce hallucination-prone guesswork.
- Candidate retrieval: a database warm-up step can substantially increase the chance the true author is in the candidate set (as reported by the paper).
- Dual-use: the same pipeline can be repurposed from attribution to targeted anonymization/re-writing strategies.
Why it matters
As LLM agents get better at coordinating tools, risks shift from “the model guesses” to “the model runs an analysis pipeline.” Stylometry has long been a deanonymization vector; agentic tooling can make it cheaper, faster, and more accessible—changing the threat model for anyone publishing sensitive writing under a pseudonym.
Practical takeaways
- If anonymity is critical, assume an adversary can run automated stylometric analysis—don’t rely on “light” rewriting.
- Organizations hosting anonymous submissions should consider threat-model guidance and mitigations (e.g., standardized writing templates, editorial rewriting, strict metadata controls).
- Researchers should treat stylometry + agents as a first-class safety topic in LLM deployments.
Caveats / what to watch
- Results depend heavily on the size/quality of the candidate author corpus and the amount of text available.
- “Anonymization” is not a binary; defenses often degrade under adaptive attacks.