GDPR Right-to-Erasure Engineering: Actually Deleting Users From Complex SaaS Systems

#GDPR right to erasure
Sandor Farkas - Founder & Lead Developer at Wolf-Tech

Sandor Farkas

Founder & Lead Developer

Expert in software development and legacy code optimization

A Berlin fintech I audited last winter had a neat-looking "Delete my account" button in the customer portal. Clicking it fired a DELETE /users/:id against their Symfony API, which set deleted_at on the users table and returned 204. The CS team closed the ticket, the customer saw the success message, and the DPO's dashboard counted the request as resolved inside the statutory thirty days. On paper, GDPR right to erasure. In reality: the user's email still sat in Postgres as the owner of twelve orders rows, their IP address was in nginx logs on three servers, their support conversation lived untouched in Intercom, their transaction metadata was in Snowflake for the next seven years of financial reporting, and their full profile was in nightly Postgres backups rotating over the following 35 days. Under Article 17, every one of those was still processing of personal data. Under Article 58, any of them was enough for a supervisory authority to open a file.

This is the gap between "soft delete on the users table" and actual GDPR right to erasure. It is also where most SaaS products in 2026 quietly fail — not because teams do not care, but because honest deletion is a cross-system engineering problem that touches databases, search indexes, queues, object storage, caches, logs, backups, analytics warehouses, and a long tail of third-party processors, each with its own retention model and its own idea of what "forget this user" means.

This post walks through what a defensible GDPR data deletion flow looks like in a mid-size SaaS, with the three hard parts that usually get skipped: backups, legally-retained data, and third-party processors.

What the GDPR Right to Erasure Actually Requires Under Article 17

The right to erasure under GDPR Article 17 is narrower than most engineers assume, and also wider than most product teams build for. Narrower, because it is conditional — it only applies when one of six grounds is met (the data is no longer necessary, consent was the basis and has been withdrawn, the processing was unlawful, and so on), and it is explicitly overridden by other legal obligations. A German accounting record you are required to keep for ten years under the Handelsgesetzbuch is not erasable on request; the lawful basis is legal obligation, not consent. Wider, because when the right does apply, it covers all processing — everywhere the data lives, across every processor you use, in every form including backups.

The practical consequences for an engineering team are three. First, user deletion SaaS flows must distinguish between data that is genuinely erasable and data that is subject to a retention obligation that takes precedence — and they must document the distinction per data category, not per table. Second, "erasure" does not mean the row is deleted from a single database; it means personal data is no longer processed, which can be satisfied by deletion, by anonymisation strong enough that re-identification is not reasonably possible, or by moving data into a tombstone form that contains no personal identifiers. Third, the obligation includes telling third-party controllers you have shared the data with — your own subprocessors are your problem, but external controllers you have pushed data to (CRM integrations, joint-controller arrangements) must be notified of the erasure request under Article 17(2).

A useful mental model: for each data category in your system, decide in advance whether it is erasable, retained-under-obligation, or eligible-for-anonymisation. Write that down. Engineering follows the map; it does not invent it per ticket.

Mapping the Data: The Part You Cannot Skip

Every deletion architecture I have seen fail has skipped the mapping step. An inventory of where personal data lives in the product is a prerequisite, not a nice-to-have, and it is the single artefact the DPO will ask for first when a regulator comes calling.

A workable inventory lists, for each data category (identity, contact, financial, behavioural, support, etc.), every system that holds it, the lawful basis for processing, the retention requirement, and the erasure strategy. For a typical B2B SaaS, that usually resolves to something like:

  • Primary operational database (Postgres / MySQL) — identity, billing, content.
  • Search indexes (Elasticsearch, Meilisearch, Typesense) — denormalised copies of profile and content data.
  • Object storage (S3, Cloudflare R2, Hetzner Storage Box) — uploaded files, avatars, exports.
  • Caches (Redis, Memcached) — session data, short-lived denormalised profiles.
  • Queues and event streams (RabbitMQ, SQS, Kafka) — in-flight events carrying user fields.
  • Analytics warehouse (BigQuery, Snowflake, ClickHouse) — event tables, often with user_id keys.
  • Logs (application, web server, audit) — IPs, emails in stack traces, URL parameters.
  • Backups (operational DB, warehouse, object storage) — point-in-time snapshots of everything above.
  • Third-party processors — payment provider, email delivery, support tool, CRM, telemetry.

Each line becomes a ticket in the erasure workflow. The orchestration layer we build next assumes this map exists.

Designing the Erasure Orchestrator

A reliable GDPR deletion flow is a long-running, idempotent, auditable state machine. Treat it as such and the rest of the design falls into place.

Concretely, a Symfony implementation using the Messenger component and Doctrine looks like this:

// src/Entity/ErasureRequest.php
#[ORM\Entity]
class ErasureRequest
{
    public const STATUS_RECEIVED   = 'received';
    public const STATUS_VERIFIED   = 'verified';
    public const STATUS_PROCESSING = 'processing';
    public const STATUS_COMPLETED  = 'completed';
    public const STATUS_BLOCKED    = 'blocked'; // legal hold

    #[ORM\Id, ORM\Column(type: 'uuid')]
    private Uuid $id;

    #[ORM\Column]
    private string $subjectRef; // pseudonymous reference, NOT the email

    #[ORM\Column]
    private \DateTimeImmutable $receivedAt;

    #[ORM\Column]
    private \DateTimeImmutable $dueBy; // +30 days, +60 with justification

    #[ORM\Column(type: 'json')]
    private array $plan = []; // list of steps from the data map

    #[ORM\Column(type: 'json')]
    private array $completed = []; // step → timestamp + evidence hash

    #[ORM\Column]
    private string $status = self::STATUS_RECEIVED;
}

Each step in plan is a typed command handled by its own consumer: DeleteFromPrimaryDb, PurgeFromSearchIndex, RedactObjectStorage, AnonymiseWarehouseRows, CallStripeErasure, CallIntercomErasure, and so on. Consumers are idempotent (calling DeleteFromPrimaryDb twice is harmless) and record an evidence hash — a non-reversible proof that the step ran — into completed. That evidence table is the defensible audit trail you hand to a regulator.

Two design rules earn their keep immediately. First, do not carry the email or any direct identifier through the message bus. Resolve to a pseudonymous reference at request intake, and have each consumer re-resolve against the primary DB under a database lock. If the orchestration run itself leaked the email into queue metadata, you have just created new processing of the data you were meant to erase. Second, treat backup-resident data as a separate obligation — the orchestrator cannot delete from backups in real time, and must not pretend to.

The Backup Problem: Tombstones and Replay Lists

Backups are where naive erasure flows collapse. A 30-day rotating Postgres backup means that for a full month after a deletion request, the user's record is recoverable from an encrypted snapshot in S3. Regulators have accepted that immediate deletion from backups is often technically disproportionate — but only if the vendor has a documented, enforced process that prevents restored data from silently re-entering processing.

The pattern that holds up in practice has three components.

First, a tombstone table in the primary database. When the orchestrator deletes the user row, it writes an entry into deletion_tombstones with the pseudonymous reference, the deletion timestamp, and the request ID. This table is backed up together with the rest of the operational database, so every point-in-time snapshot carries its own record of deletions.

Second, a restore playbook that is part of the disaster recovery runbook, not an afterthought. Whenever a backup is restored for any reason, the playbook requires replaying the tombstone table against the restored data before the restored system is allowed back into production traffic. For a Symfony application, that replay is a console command:

// src/Command/ReplayErasureTombstonesCommand.php
final class ReplayErasureTombstonesCommand extends Command
{
    protected static $defaultName = 'gdpr:replay-tombstones';

    public function execute(InputInterface $input, OutputInterface $output): int
    {
        $tombstones = $this->em->getRepository(DeletionTombstone::class)->findAll();

        foreach ($tombstones as $tombstone) {
            $this->eraser->executePlan(
                $tombstone->getSubjectRef(),
                reason: 'post-restore tombstone replay',
                originalRequestId: $tombstone->getRequestId(),
            );
        }

        return Command::SUCCESS;
    }
}

Third, a retention cap on backups themselves. A 30-day window is defensible; a year of nightly backups containing deleted-user personal data is not. Shorter windows — and separate long-term backup lanes that contain only anonymised or legally-retained data — shrink the surface dramatically.

With those three in place, backups stop being a GDPR backup deletion landmine and become a documented exception with a working containment mechanism.

Anonymisation Where Deletion Breaks the Product

Some personal data cannot be deleted without breaking something the business legitimately needs. Order history, invoice line items, audit logs of security-relevant actions, aggregated analytics — each has a legal or operational reason to survive the user.

The right answer here is not to refuse the erasure. It is to anonymise the data at the moment of deletion so that what remains is no longer personal data under GDPR. Done properly, anonymisation means the record can no longer be linked back to a natural person, even by the controller, even with all other data the controller holds.

Concretely, at the database layer:

-- Anonymise orders instead of deleting, preserving financial and operational integrity
UPDATE orders
SET
  customer_email     = NULL,
  customer_name      = NULL,
  customer_ip        = NULL,
  shipping_address   = NULL,
  billing_address    = NULL,
  customer_reference = 'anon_' || md5(random()::text || id::text)
WHERE customer_id = :subject_id;

-- Delete the customer row; orders now reference a non-existent id, which is the point.
DELETE FROM customers WHERE id = :subject_id;

Two traps to avoid. A hash of the email is not anonymisation — it is a pseudonym, because the same input produces the same output and you or anyone else could re-identify a known user by hashing their email again. Real anonymisation requires breaking the link irreversibly, which usually means nulling direct identifiers and generating fresh non-correlatable references for anything that must remain unique. And "k-anonymity above some threshold" works for analytics but not for transactional records that remain uniquely identifying because of the combination of fields (time, amount, product, geography) they carry.

For the warehouse side, the pattern is similar: null the PII columns, keep the aggregate behaviour, and make sure the transformation runs as part of the erasure plan rather than the next quarterly cleanup.

Third-Party Processors and the Long Tail

The subprocessors on the Data Processing Agreement list are the part where GDPR automation either holds together or falls over. Every vendor with personal data has to be reached, and each one has its own API, its own latency, and its own definition of "delete."

The table of what matters in 2026 for a typical European SaaS stack: Stripe has a data redaction API for customers; Intercom supports user deletion via its identity API but keeps anonymised conversation metadata; HubSpot and Salesforce require contact deletion plus deal and activity reviews; Sentry has a user-deletion endpoint that must be called per project; Mailjet, Postmark, and SendGrid each expose suppression and deletion endpoints separately; Cloudflare logs are controlled by retention settings, not per-record deletion.

Engineering-wise, the pattern that scales is a processor registry in code: each subprocessor gets an adapter class implementing ErasureAdapterInterface with one method that takes the pseudonymous reference and returns an evidence object. The orchestrator loops over adapters, retries on transient failure, escalates to a human queue on permanent failure, and never declares the request complete until every adapter reports success or has been explicitly marked as "not applicable — no data present."

This registry is also the artefact that tells the DPO which subprocessor agreements need to change when a new vendor is onboarded or retired. If a data subject request engineering flow cannot cleanly answer which vendors were called, when, and with what result, the DPO cannot sign off on it and the regulator would not be impressed either.

What to Build First

A mid-size SaaS that wants to close the gap in a quarter, not a year, can usefully sequence the work like this: start with the inventory and classification, because the rest is useless without it. Then build the tombstone table and the restore playbook — these are the backup insurance policy and they do not depend on the orchestrator. Then ship a minimal orchestrator covering the primary database, search index, object storage, and the top three third-party processors by data volume. Extend coverage iteratively, with each new subprocessor adapter landing as a PR rather than a project.

If you are starting from an older PHP or Symfony codebase where personal data has quietly spread through twenty tables and three warehouses, a legacy code optimization pass scoped specifically to erasure mapping — finding every PII column, every dump script, every integration that exports data — is usually the fastest way to get to a defensible plan. Where the deletion flow ends up redesigning core parts of the data model, a custom software development engagement treats it as the architecture change it actually is, rather than a compliance patch.

The underlying shift is cultural more than technical. Right-to-erasure stops being a ticket CS closes in thirty days and becomes a first-class data contract the engineering team owns — with runbooks, adapters, tombstones, and an audit trail that survives inspection. Once that is in place, GDPR automation becomes tractable; before it, every deletion request is a small crisis waiting to be audited.

Wolf-Tech helps European SaaS teams design and build defensible right-to-erasure architectures across PHP/Symfony backends, Next.js frontends, analytics warehouses, and third-party integration stacks. Contact us at hello@wolf-tech.io or visit wolf-tech.io for a free consultation.