Draft
TL;DR
CLFEC proposes a benchmark for correcting both linguistic
errors (spelling/grammar/punctuation) and factual errors (wrong
entities, dates, numbers) in paragraph-level Chinese professional
writing. The paper argues unified correction in shared context works
better than a sequential pipeline—and that agentic workflows help strong
models but can destabilize weaker ones.
What this is about
Most Chinese text correction benchmarks focus on linguistic errors
(character/grammar fixes), while factual correction is often treated as
a separate “verify-and-rewrite” problem. In real professional
writing—finance, law, medicine, current affairs—these error types
co-occur and interact at the paragraph level.
CLFEC is introduced as a unified task and dataset to evaluate this
combined setting. The dataset is built from domain texts with injected
and manually verified error instances, and the evaluation treats
corrections as minimal span edits.
Key points
- Defines a unified task: linguistic + factual error
correction in the same paragraph context. - Dataset spans multiple professional domains and includes different
diagnostic splits (linguistic-only, factual-only, mixed,
error-free). - Finds that unified correction can outperform a
staged pipeline because the model sees interactions between error
types. - Notes an important production issue:
over-correction on clean inputs can hurt trust. - Explores an agentic proofreading setup (plan →
search/verify → revise) that benefits strong models but may hurt weaker
ones.
Why it matters
This is directly relevant if you’re building “document agent” systems
(proofreading, compliance, editing) where factual correctness and
language correctness are entangled. The paper also highlights an
operational signal: measuring and controlling over-correction is as
important as catching true errors.
Practical takeaways
- Treat proofreading as a workflow: plan,
ground facts, verify, then
apply minimal edits. - Add guardrails against over-correction: e.g., calibrate to keep
edits conservative on clean text. - Consider mixing deterministic verification tools (span checks) with
LLM rewriting to reduce hallucinated edits.
Caveats / what to watch
- Dataset availability and reproducibility may depend on release
status. - Factual correction quality depends heavily on the
retrieval/verification source.