Definition

Format-preserving anonymisation

Format-preserving anonymisation is a substitution technique where the replacement value satisfies the same syntactic format as the original (valid IBAN checksum, valid Luhn card number, valid-looking email) — so downstream systems that validate format still accept the placeholder.

For an AI DLP, format preservation is what lets the LLM reason normally about the data structure ("this looks like a customer's IBAN, route it to the EU department") without ever seeing the real value. Without format preservation, generic [REDACTED] markers break LLM reasoning chains and produce useless answers — leading users to bypass the DLP entirely.

Why it matters

  • Bypass rate is the early-warning indicator: if users start retyping prompts to evade the DLP, anonymisation is too aggressive or too disruptive.
  • Format-preserving values are the technical key to keeping a Block / Anonymize mode rolled out for the long term.

Related terms