Zero-Downtime Database Migrations: A Practical Playbook
Every team has a war story that starts the same way: a schema migration that seemed straightforward, deployed on a Tuesday afternoon, and produced five minutes of 500 errors before someone figured out how to roll it back. Or worse—a migration that silently locked a critical table under load, causing a cascade of timeouts that lasted an hour and generated a postmortem nobody wanted to write.
Database migrations are uniquely dangerous because they sit at the intersection of two systems that rarely change together cleanly: your application code and your persistent data. Deployed incorrectly, a migration can cause production outages, data loss, or subtle corruption that surfaces days later. The good news is that the engineering community has developed a set of patterns that make zero-downtime database migrations reliable and repeatable. This post walks through those patterns in practical terms, with examples for PHP/Symfony applications using Doctrine Migrations.
Why Database Migrations Are Risky
The core problem with database schema changes is that they are not atomic from the perspective of a running application. When you deploy a new version of your application and run a migration simultaneously, there is a window—however brief—when either the new code is running against the old schema, or the old code is running against the new schema. Neither combination is guaranteed to work correctly.
The three dangerous migration operations are column drops, column renames, and NOT NULL constraints added to existing columns. Each of these creates a hard incompatibility between one version of the application and one state of the schema.
A column drop is obvious: if old code tries to read or write a column that no longer exists, it throws an error. Less obvious is a NOT NULL constraint added to a column with existing rows. Even if your new code always writes a value for that column, running the migration creates a window where rows inserted by old code—still running during the deployment—will fail the new constraint. Under load, with a rolling deployment, this window can last minutes.
Table locks are a separate problem. Several common migration operations cause the database to acquire an exclusive lock on the table while the migration runs. ALTER TABLE in MySQL with MyISAM storage, or in older InnoDB without online DDL support, will lock the table for reads and writes until the operation completes. On a users table with five million rows, an index addition with a full table lock can take minutes. Every request that touches that table during those minutes either waits or fails.
The Expand-Contract Pattern
The expand-contract pattern (sometimes called parallel change) is the foundational technique for zero-downtime schema migrations. The core idea is to decompose every breaking schema change into three separate, safe phases.
Phase 1: Expand. Add the new schema elements without removing anything. If you are renaming a column, add the new column alongside the old one. If you are changing a column type, add a new column with the correct type. If you are adding a NOT NULL constraint, add the column as nullable first. The schema now supports both the old and new application code.
Phase 2: Migrate. Deploy the new application code that writes to both the old and new columns (or reads from the new column, with a fallback to the old one during the transition). Backfill existing rows by copying data from the old column to the new one. Once all rows are backfilled and all new writes are going to the new column, the old column carries only historical data.
Phase 3: Contract. Deploy a final version of the code that no longer references the old column. Only then drop the old column from the schema. At this point, no running code reads or writes the old column, so the drop is safe.
For a concrete example, consider renaming a user_name column to display_name in a Symfony application with Doctrine.
// Migration: Phase 1 — Expand (add new column)
public function up(Schema $schema): void
{
$this->addSql(
'ALTER TABLE users ADD COLUMN display_name VARCHAR(255) NULL'
);
}
At this point, both columns exist. Deploy application code that writes to both:
// Entity during transition period
class User
{
#[Column(type: 'string', nullable: true)]
private ?string $userName = null; // old column — still written
#[Column(type: 'string', nullable: true)]
private ?string $displayName = null; // new column — also written
public function setDisplayName(string $name): void
{
$this->userName = $name; // keep old column in sync
$this->displayName = $name; // populate new column
}
public function getDisplayName(): string
{
// read from new column, fall back to old if not yet populated
return $this->displayName ?? $this->userName ?? '';
}
}
Backfill existing rows in a separate migration:
// Migration: Phase 2 — Backfill
public function up(Schema $schema): void
{
$this->addSql(
'UPDATE users SET display_name = user_name WHERE display_name IS NULL'
);
}
For tables with millions of rows, run this backfill in batches to avoid long-running transactions that hold locks:
// Batched backfill — run as a Symfony command, not a migration
public function execute(InputInterface $input, OutputInterface $output): int
{
$batchSize = 1000;
$lastId = 0;
do {
$count = $this->connection->executeStatement(
'UPDATE users SET display_name = user_name
WHERE display_name IS NULL AND id > :lastId
ORDER BY id LIMIT :batchSize',
['lastId' => $lastId, 'batchSize' => $batchSize]
);
$lastId += $batchSize;
sleep(0); // yield to other queries between batches
} while ($count > 0);
return Command::SUCCESS;
}
Once all rows are backfilled and the transition code has been in production for a deployment cycle, deploy the final version that removes the old column reference, and then run the contract migration:
// Migration: Phase 3 — Contract (remove old column)
public function up(Schema $schema): void
{
$this->addSql('ALTER TABLE users DROP COLUMN user_name');
}
Online DDL and Lock-Free Schema Changes
Modern database versions have improved significantly in their ability to perform schema changes without exclusive locks. MySQL InnoDB's online DDL, available since MySQL 5.6, allows many ALTER TABLE operations to run concurrently with reads and writes using an internal log to replay DML changes made during the operation. PostgreSQL's CREATE INDEX CONCURRENTLY builds indexes without holding a lock on the table.
The key principle: know which operations in your database version require a full table lock and which support online execution. In MySQL 8 with InnoDB, adding a column and adding a non-unique index are both online operations. Changing a column type that requires a row-format change is not. In PostgreSQL, adding a nullable column is instant (it requires only a catalog update). Adding a NOT NULL constraint on an existing column requires a full table scan to verify.
For Symfony projects, the Doctrine DBAL type system handles many of these differences, but it does not make platform-specific DDL choices on your behalf. When writing migrations for high-traffic tables, verify the DDL plan directly:
-- MySQL: check whether a migration will use online DDL
EXPLAIN ALTER TABLE orders ADD COLUMN processed_at DATETIME NULL;
-- Look for "ALGORITHM=INPLACE, LOCK=NONE" in the output
-- PostgreSQL: use CONCURRENTLY for index creation
CREATE INDEX CONCURRENTLY idx_orders_processed_at
ON orders (processed_at);
For operations that cannot run online, consider pt-online-schema-change (MySQL) or pg_repack (PostgreSQL). These tools implement online schema changes by creating a new table, copying rows in batches, and swapping the old and new tables atomically—eliminating the lock window that a direct ALTER TABLE would create.
CI/CD Pipeline Integration
Running migrations in a CI/CD pipeline introduces a synchronization problem: the migration and the code deployment need to execute in the right order, and the sequence depends on which phase of the expand-contract pattern you are in.
The general rule is:
- Expand migrations run before the new application code is deployed. The old code ignores new columns; the new schema is safe to add before the code switch.
- Contract migrations run after the new application code is deployed. Only run a drop or a NOT NULL addition after confirming the new code—which no longer references the old column—is live and stable.
In a typical GitHub Actions pipeline for a Symfony deployment:
jobs:
deploy:
steps:
- name: Run expand migrations
run: php bin/console doctrine:migrations:execute --up \
'App\Migrations\Version20260411_ExpandPhase'
- name: Deploy application
run: |
# Rolling deployment: new containers replace old ones
kubectl set image deployment/app app=$NEW_IMAGE
kubectl rollout status deployment/app
# Contract migrations run in a separate job
# triggered after the expand deployment is confirmed stable
cleanup:
needs: [deploy]
if: ${{ inputs.run_contract_migration }}
steps:
- name: Run contract migrations
run: php bin/console doctrine:migrations:execute --up \
'App\Migrations\Version20260411_ContractPhase'
Separating the expand and contract phases into distinct pipeline jobs—rather than running all migrations in a single pre-deployment step—enforces the discipline that contract migrations cannot run until the expand deployment has been stable.
Safe Patterns for Common Migration Types
Beyond column renames, a few migration types cause persistent trouble in production systems.
Adding a NOT NULL column to an existing table. Never add a NOT NULL constraint directly to a column on a table with existing rows. Instead: add the column as nullable, deploy code that writes the new value, backfill existing rows, then add the NOT NULL constraint in a separate migration after the backfill is complete.
Changing a column type. Expanding the type (VARCHAR(50) → VARCHAR(200)) is typically safe and online in modern databases. Changing the data representation (VARCHAR to JSON, INT to BIGINT) requires an expand-contract approach with a transition column.
Adding a unique constraint. A unique constraint addition may fail if duplicate values exist in the column. Always check for and resolve duplicates before adding the constraint—do not rely on the migration to catch them, because a migration failure mid-deployment is harder to recover from than a pre-flight check.
Foreign key additions. Adding a foreign key constraint to an existing column requires that all existing values in the column already satisfy the constraint. Validate referential integrity before the migration, not during it.
Monitoring and Rollback
Every migration deployment should have a defined rollback path, and that path should be validated before deployment, not during an incident.
For Doctrine Migrations, the down() method defines the rollback. In the expand phase, rollback simply drops the new column—straightforward and safe. In the contract phase, rollback is more complex and may not be fully reversible if the dropped column contained data not captured elsewhere. This asymmetry is another reason to keep the contract phase separate and to apply it only after extended stability in production.
During migration execution, monitor these signals:
- Database lock wait times — a spike in
innodb_row_lock_waits(MySQL) orlock_awaited_queries(PostgreSQL) indicates contention from a migration running longer than expected. - Application error rates — a 5xx spike within 60 seconds of a migration run signals a compatibility issue between the schema change and running code.
- Slow query log — migrations that run as long transactions will appear here and can be killed if necessary.
A code quality audit of a Symfony application's migration history frequently surfaces migrations that would cause problems under load—NOT NULL additions without backfills, index additions without ALGORITHM=INPLACE checks, and batched updates written as single transactions that lock entire tables. Catching these before they run in production is orders of magnitude cheaper than recovering from them afterward.
Putting It Together
The patterns described here are not novel—they are well-established techniques that teams at scale have used for years. What makes them difficult in practice is discipline: the expand-contract sequence requires running three deployments where one feels sufficient, and the temptation to collapse them into a single migration under time pressure is real.
The teams that do this consistently share two practices. First, they codify the pattern in their migration review process—every migration PR is reviewed not just for correctness but for its position in the expand-contract sequence and its expected DDL behavior under load. Second, they run staging environments under realistic data volumes, so migration performance is known before production, not discovered during an incident.
If your team is working toward zero-downtime deployments and running into database migration friction, this is a solvable problem with established tooling and methodology. Wolf-Tech has helped engineering teams in Germany and across the EU implement safe deployment pipelines that include database migrations as a first-class concern. Reach us at hello@wolf-tech.io or visit wolf-tech.io for a free consultation on your deployment architecture.

