Format-Preserving Anonymization for LLM DLP

The problem with XXXX

Here is a credit card number that passes the Luhn check: 4532-1488-0343-6467. Now redact it the naive way and send this to ChatGPT:

Is XXXX-XXXX-XXXX-XXXX a valid card number?

The model will answer something like "no, that is a placeholder, not a real card number". Technically correct. Practically useless. The question you actually wanted answered was "is the card my user just typed structurally valid", and you destroyed every signal the model needed to give that answer.

Now try the format-preserving rewrite:

Is 4532-1488-XXXX-6467 a valid card number?

The model can now reason: leading 4 means Visa, 16 digits matches Visa length, last 4 visible for downstream display, the middle bytes are clearly redacted. It will answer the actual question — yes, the pattern is consistent with a real Visa card, the checksum could be regenerated, and here is how to display it to the user.

This is the gap between naive redaction and format-preserving anonymization. The first is a security checkbox that breaks the product. The second is what Zeuslock ships by default, and it exists because the alternative — users ripping out DLP because it made their AI assistant stupid — is strictly worse for security than any leakage cost.

Naive redaction is information destruction

The reason engineers reach for XXXX is that it feels safe. It is the strongest possible scrub: every byte of the secret is gone. The problem is that you also destroyed the metadata. Three things travel with a real value, and you almost never want to lose all three:

The format. A 16-digit hyphenated string is a credit card. An IBAN starts with two letters. A JWT has two dots. A bearer token is base64url. The format is how the LLM knows what kind of object it is talking about.
The semantics inside the format. The leading 4 of a Visa, the country prefix of an IBAN, the AKIA prefix of an AWS long-term key, the kid in a JWT header — these tell the model which library, which provider, which jurisdiction is relevant.
The non-secret tails. Last 4 digits of a card, the email local-part length, the IBAN check digits — context the AI needs to give an answer that the user can act on.

Strip all three and you are not asking the AI for help, you are asking it to guess.

Case 1: credit cards

Take 4532-1488-0343-6467. It passes Luhn, it is 16 digits, the leading 4 identifies Visa, the BIN 453214 resolves to a specific issuer. There are three redaction strategies and they produce three very different LLM behaviors.

Strategy	String sent to the LLM	What the LLM can still answer
Original (no DLP)	`4532-1488-0343-6467`	Everything. Also: full PCI breach.
Naive XXXX	`XXXX-XXXX-XXXX-XXXX`	Almost nothing. "This is a placeholder."
Format-preserving	`4532-1488-XXXX-6467`	Visa. 16-digit pattern. Luhn-valid template. Last 4 visible. Issuer inferable.

The middle row is the one most homemade DLP scripts produce, and it is also the one that drives users to copy-paste the original card number into a personal ChatGPT account so they can actually get help. The third row is the one that keeps both the security team and the user productive.

Generating a format-preserving Luhn-valid replacement in Python

Below is a tiny generator. Given a real card number, it keeps the first six and last four digits and rewrites the middle so the result still passes Luhn. The original is never logged or persisted.

import secrets

def luhn_check_digit(digits: str) -> int:
    total = 0
    for i, ch in enumerate(reversed(digits)):
        d = int(ch)
        if i % 2 == 0:
            d *= 2
            if d > 9:
                d -= 9
        total += d
    return (10 - (total % 10)) % 10

def format_preserve_card(original: str) -> str:
    digits = ''.join(c for c in original if c.isdigit())
    if len(digits) != 16:
        raise ValueError('expected 16-digit card')
    bin6, last4 = digits[:6], digits[-4:]
    # 5 random middle digits + 1 computed Luhn check digit
    middle = ''.join(str(secrets.randbelow(10)) for _ in range(5))
    candidate = bin6 + middle + '0' + last4
    check = luhn_check_digit(candidate[:-1] + last4[:-1])
    fake = bin6 + middle + str(check) + last4
    return f'{fake[:4]}-{fake[4:8]}-{fake[8:12]}-{fake[12:]}'

The output is a real-looking 16-digit Visa that no payment processor will ever charge, because it is statistically almost certainly not assigned. The LLM treats it as a card, the issuer is correctly inferred, and the user gets a usable answer.

Case 2: IBAN

An IBAN like FR76 3000 6000 0123 4567 8901 234 carries four layers of meaning: country (FR), check digits (76), bank/branch code (3000 6000), account number, and a national check. Naive XXXX destroys all four. Now ask the LLM "which country is this account in?" and you get a refusal.

Format-preserving rewrite: FR76 XXXX XXXX XXXX XXXX XXXX 234. The country and mod-97 check digits are kept. A downstream model can confidently answer "France", and a generator can rebuild a valid-looking checksum if your test pipeline needs one.

Trade-off: keeping FR76 leaks geographic data. For a customer-support workflow that is fine and probably wanted. For an internal HR pipeline that processes employees across jurisdictions, it might not be. Format-preservation is a policy choice, not a default for every detector. Zeuslock lets you tune it per data type per pipeline.

Case 3: AWS access keys

An AWS long-term access key looks like AKIAIOSFODNN7EXAMPLE. The AKIA prefix is not random — it identifies the key type (ASIA for STS temporary, AKIA for IAM long-term). The length is fixed at 20.

Naive redaction to XXXX tells the LLM nothing. Format-preserving AKIA**************** tells the LLM "this is an IAM long-term key, length 20, secret entropy removed". The replacement is not a valid key — AWS will reject it instantly — but the model can now correctly answer "rotate this via aws iam create-access-key, then revoke the old one" instead of "I cannot help with placeholder strings".

Same principle for Stripe (sk_live_), OpenAI (sk-), GitHub (ghp_), Slack (xoxb-), GitLab (glpat-), npm (npm_). The prefix is the type. Keep the prefix, kill the entropy.

Case 4: JWTs

A JWT has three dot-separated segments: header.payload.signature. The whole point of a JWT is that you can decode the header to discover the algorithm without verifying the signature. If an engineer pastes a real JWT into Claude with the prompt "why is my token failing validation", the model needs the structure to give useful advice.

Naive redaction collapses the JWT to XXXX. The model now has nothing to talk about. Format-preserving keeps the dots and segment lengths:

eyJhbGciOiJIUzI1NiJ9.XXXXXXXXXXXXXXXX.XXXXXXXXXXXX

The model sees three segments, can comment on the visible header ("alg: HS256, you probably want RS256 for asymmetric verification"), and the actual signature bytes never travel. The engineer gets a real answer; the secret stays local.

Case 5: emails — the hardest one

Email is the case where the right rewrite depends on what you are trying to protect. Two reasonable options for alice@acme.com:

Rewrite	What survives	When it is right
`user@example.com`	Generic shape only.	The LLM needs to know "this is an email". The identity and the employer are both sensitive. Default choice.
`XYxYx@acme.com`	Domain preserved.	You are debugging a deliverability issue, an SPF record, an internal email routing rule. The domain is the load-bearing context.
`alice@example.com`	Local-part preserved.	Almost never the right choice. The local-part is usually the identity.

The middle row is the one to think hard about. Preserving the domain leaks the employer of the user whose email this is. For a customer-support tool that is fine — the agent knows it is acme.com — but for a CV-screening pipeline it can leak protected characteristics. Same rule as IBANs: format-preservation is a policy, not a default.

When format-preserving is the wrong choice

The whole point of format-preservation is that the format carries semantically useful, non-secret information. There are categories where the format itself is the secret. In those cases, anonymize harder, or block.

Internal SSN-style identifiers. A US SSN's first three digits used to encode the issuing state. A French NIR encodes sex, birth year, birth month, birth département and commune. "Format-preserving" an NIR means leaking demographics. The right move is full redaction or block.
Internal customer IDs. If your customer ID is ACME-2024-018472 and the year encodes onboarding cohort, the format leaks business intelligence. Treat it as opaque.
Sequential IDs in a small space. If only 5000 employees exist and the employee ID is sequential, preserving the format reduces the entropy of the redaction to almost zero — the model (or a curious observer) can re-identify the employee from the surrounding context.
Anything where the regulator has explicitly said "do not retain pseudonymous identifiers without justification". The CNIL, the AEPD and the BfDI all take this position for special-category data under RGPD/GDPR. If you cannot point to a legitimate processing reason for keeping the prefix, drop the prefix.

The trade-off Zeuslock chose, and why

Zeuslock's defaults err toward preserving structure. The reason is a simple behavioral observation: when DLP makes the AI assistant useless, the user does not give up on the AI assistant. The user gives up on the DLP. They copy the prompt into a personal ChatGPT account, or a phone, or a coffee-shop browser. The leak still happens, plus you have lost the audit trail.

The hardest part of building a DLP for LLMs is not detection. It is making the protected experience good enough that the user does not actively try to bypass it.

Format-preserving anonymization is the technical primitive that makes that work. The user keeps getting useful answers, the secret bytes stay on the laptop, and the operator gets a full audit trail of which detector fired and which rewrite was applied. Per data type, you can tighten any rewrite down to XXXX, or escalate to Block mode — but the default is structured-but-fake, because that is the default that survives contact with real users.

What to do next

Audit your current redaction. If anywhere in your stack you are replacing detected values with [REDACTED], XXXX or an empty string before they reach an LLM, you are degrading the answer for no security gain over a structured rewrite.
Pick the format-preservation policy per data type. Cards and JWTs almost always benefit. IBANs and emails are a judgement call. SSN-class identifiers should usually be fully redacted.
Show the rewrite to the user before sending. Zeuslock's pre-send preview teaches the habit faster than any training program — see the operator guide on configuring detection policies for the recommended Monitor → Anonymize → Block rollout.

Format-Preserving Anonymization: Why XXXX Breaks Your AI Pipeline

The problem with XXXX

Naive redaction is information destruction

Case 1: credit cards

Generating a format-preserving Luhn-valid replacement in Python

Case 2: IBAN

Case 3: AWS access keys

Case 4: JWTs

Case 5: emails — the hardest one

When format-preserving is the wrong choice

The trade-off Zeuslock chose, and why

What to do next

Protect your data from AI leaks

Continue reading

Why Email DLP Fails on ChatGPT (and what works instead)