Database Migration on Live Systems: Expand-Contract Patterns That Avoid Downtime

June 26, 2026#zero-downtime database migration

Sandor Farkas

Founder & Lead Developer

Expert in software development and legacy code optimization

Most teams have a procedure for deploying code changes without downtime. Far fewer have one for schema changes. Yet the database is where zero-downtime database migration plans collapse. A background ALTER TABLE on a 50-million-row table can hold a lock for minutes. Renaming a column breaks the running application the moment the migration commits. Dropping a column works fine - until the old code version, still serving traffic during a rolling deployment, panics because the column is gone.

The expand-contract pattern solves this. It is not a single command or a library; it is a discipline for splitting any schema change into two or more safe, independently deployable steps. Each step leaves the database in a state that works with both the current and the previous version of your application code.

Why Schema Changes Are Different from Code Changes

Code changes are usually reversible in seconds. You roll back a deployment, and the old binary is serving requests again. Schema changes are not: a committed ALTER TABLE stays committed. You can undo it with another migration, but you cannot un-commit it without another round-trip through your deployment pipeline.

This asymmetry means that a schema change and the code that depends on it cannot be deployed atomically, at least not on a live system where you want to keep serving requests throughout. Something has to give. The expand-contract pattern says: let the database temporarily carry both old and new representations, and let code catch up in a separate deployment.

The Three Phases

Expand means adding to the schema without removing anything. New columns are added as nullable or with defaults. New tables are added. New indexes are built concurrently (PostgreSQL supports CREATE INDEX CONCURRENTLY, which does not hold a table lock). Nothing that the current application code uses is touched.

Migrate is the data-movement step. You backfill the new column with values derived from the old one. You copy rows between tables. You populate lookup data. During this phase, both old and new code can run simultaneously because the old column still exists and the new one is already populated. A background job or a batched SQL script handles this; it does not have to run in a single transaction.

Contract is the cleanup phase: removing the old column, dropping the old table, deleting the compatibility code paths. This only runs after you have confirmed that no running application instance reads from or writes to the old structure. In a rolling deployment, that means waiting until 100% of instances have moved to the version that uses the new structure.

A Concrete Example: Renaming a Column

Renaming a column is one of the simplest schema changes and one of the most hazardous if done naively. Here is how expand-contract handles it.

Suppose you have a users table with a full_name column. You want to rename it to display_name.

Step 1 - Expand. Add the new column:

ALTER TABLE users ADD COLUMN display_name VARCHAR(255);

At this point, display_name is null for all rows. The current application still reads and writes full_name and does not notice.

Step 2 - Deploy the dual-write code. Ship a version of your application that writes to both full_name and display_name on every write, and reads from full_name (the authoritative source) on every read. This version is backward-compatible: it still works if a previous instance runs alongside it.

Step 3 - Backfill. Copy the data from old to new for existing rows:

UPDATE users SET display_name = full_name WHERE display_name IS NULL;

On a large table, run this in batches rather than in a single transaction to avoid long lock contention. In Symfony, a console command that processes rows in chunks of a few thousand, with a short sleep between batches, is a practical approach. On PostgreSQL you can also use pg_advisory_lock to avoid competing with application writes if writes are infrequent.

Step 4 - Switch reads to the new column. Deploy a version that reads from display_name and continues writing to both. Verify correctness under production traffic. Your old version is no longer running at this point.

Step 5 - Stop writing to the old column. Deploy a version that reads and writes only display_name. The full_name column is now orphaned.

Step 6 - Contract. Once you are certain no live process touches full_name, drop it:

ALTER TABLE users DROP COLUMN full_name;

On PostgreSQL, dropping a column is typically fast because Postgres marks it as dropped rather than immediately rewriting the table. The actual space is reclaimed lazily via VACUUM or immediately with VACUUM FULL (which does lock).

Handling NOT NULL Constraints

Adding a NOT NULL column to a large table used to require a full table rewrite in PostgreSQL because Postgres needed to verify the constraint for every row. Since PostgreSQL 11, you can add a NOT NULL column with a non-volatile default without a rewrite, because Postgres stores the default value as metadata rather than writing it into every row.

For versions before 11, or when the default is computed rather than constant, the safe approach is:

Add the column as nullable.
Backfill all rows.
Add a DEFAULT so new rows are covered.
Add the constraint as NOT VALID - this skips checking existing rows.
Run VALIDATE CONSTRAINT separately - this takes only a SHARE UPDATE EXCLUSIVE lock, which does not block reads or writes.

This is more steps, but each step is non-blocking.

Index Creation Without Locking

A regular CREATE INDEX in PostgreSQL holds a SHARE lock, blocking all writes for the duration. CREATE INDEX CONCURRENTLY builds the index in multiple passes while the table remains fully available for reads and writes. The tradeoff is that it takes longer and uses more CPU, but on a live production system that is the correct tradeoff.

The expand phase of any schema migration that adds an index should always use CONCURRENTLY. In Doctrine Migrations or raw SQL migration scripts, this means using a separate migration file for index creation so it can be run independently of the schema change itself.

Note that CREATE INDEX CONCURRENTLY cannot run inside a transaction. If your migration framework wraps all statements in a transaction by default (Doctrine does), you need to either disable that for the index migration or run the CREATE INDEX CONCURRENTLY statement via a separate manual step.

Where This Fits in a Legacy System Upgrade

If you are working through a larger legacy code optimization effort, database schema changes are often the most risky part. Legacy systems accumulate wide tables with ambiguous column names, implicit constraints enforced only in application code, and missing indexes on foreign keys. Cleaning this up requires exactly the expand-contract discipline: you cannot correct ten years of schema drift in a single migration without shutting down.

The same applies to teams doing custom software development that involves migrating data from an old system to a new one. You often need to run old and new schemas in parallel during a transition period while data is verified and migrated. Expand-contract gives you a structured way to manage that period without a hard cutover.

Tooling Considerations

Most migration tools - Doctrine Migrations, Flyway, Liquibase - are designed around linear, sequential migrations. They do not enforce expand-contract as a policy; that is your team's responsibility. What they do provide is versioning and tracking so you can reason about which migration steps have run on which environment.

A few practices help:

Keep migration steps small and named by intent. A file called 1_add_display_name_column.sql is easier to reason about than a large migration that does five things at once.

Never mix expand, migrate, and contract in one file. If the steps are separated, you can deploy the expand step, let it run, verify it, deploy the code change, and schedule the contract step for a later release. If they are in one file, you lose that flexibility.

Test the zero state. Before running the contract step, verify in staging that no application code reads the old column. A simple grep across the codebase plus a short observation period in production with query logging is usually enough.

The Discipline Is the Point

The expand-contract pattern is not technically difficult. The SQL is straightforward. The challenge is organizational: teams are under pressure to ship, and a two-step migration that takes two deployment cycles feels slower than a single migration that gets it done in one.

But the single-step migration is only faster if nothing goes wrong. When a locked table causes a cascade of timeouts at 2 AM, the expand-contract approach starts looking like the faster path.

If your team is navigating a backlog of schema changes on a live system and you are not sure how to sequence them safely, that is the kind of problem Wolf-Tech works through regularly. Get in touch at hello@wolf-tech.io or visit wolf-tech.io to start a conversation about your specific situation.