Migrating 700K Lines of Doctrine 1.x: A Schema-First Path to Doctrine 3 Without a Code Freeze

#Doctrine 1 to Doctrine 3 migration
Sandor Farkas - Founder & Lead Developer at Wolf-Tech

Sandor Farkas

Founder & Lead Developer

Expert in software development and legacy code optimization

Doctrine 1.x reached end of life in 2014. That was eleven years ago, and yet a significant number of European mid-market PHP applications are still shipping it — usually buried under a framework of custom patches, hand-rolled query builders, and tribal knowledge held by developers who have since moved on. The codebases that stick with Doctrine 1 the longest tend to be the ones where it is most tightly coupled to everything else: ActiveRecord patterns woven through 700,000 lines of controllers, service classes, and report generators that were never meant to be portable.

This post documents the approach we used on one such engagement. The client was a German B2B SaaS company — mid-market, eight-figure ARR, a PHP codebase that had been in active development since 2009. The Doctrine 1 migration was the precondition for a broader PHP 8 and Symfony 7 modernisation, and the constraint was absolute: feature work could not stop for four months while infrastructure was rebuilt. The product team had quarterly targets. The migration had to happen around them.

What follows is the playbook: the schema-first introspection strategy, the dual-ORM bridge that kept legacy code working while new code used Doctrine 3, the test harness built around database snapshots, and the cutover sequence that actually got the migration across the finish line.


Why Doctrine 1.x Is Still Around (And Why It Is Now a Liability)

Doctrine 1 uses the ActiveRecord pattern. Entities extend Doctrine_Record, and the ORM is part of the object itself rather than a separate unit of work. This made it very easy to write data access code in 2008, and very hard to migrate away from it a decade later — every model method that calls $this->save() is coupled directly to the ORM layer.

The security implications have become severe. Doctrine 1 predates modern PHP security practices around parameterised queries in edge cases, and the unmaintained codebase has accumulated CVEs that no one is patching. Any application that processes customer data under GDPR — which describes virtually every European B2B SaaS — is carrying real compliance exposure by running it in 2026.

The practical problems compound over time. PHP 8.x introduced constructor promotion, named arguments, and fibers, and the Doctrine 1 codebase is incompatible with several of them at the level of internal __get/__set magic methods. Running PHP 8.3 with Doctrine 1 is possible with enough polyfills, but each PHP minor release narrows the window further.

The migration is not optional. The question is how to do it without a four-month feature freeze.


The Core Problem With YAML-First Migration Strategies

The standard Doctrine 1 migration advice is to start with the YAML schema files (schema.yml) and convert them to Doctrine 3 entity attributes or annotations. In theory, you read the YAML, generate the entities, and iterate from there.

In practice, this fails badly on large codebases. After years of manual database changes, hotfixes applied directly to production, and developers who did not update the YAML when they altered a table, the schema.yml files in a 700K-line codebase are not an accurate description of the database. They describe the database as it was intended to be at some point in the past, with divergences that no one has tracked systematically.

We discovered this when a junior developer on the client team spent three weeks generating Doctrine 3 entities from the YAML and then ran the first integration test suite against staging. 23% of queries produced wrong results or fatal type errors — not because the migration was done badly, but because the source of truth was wrong.

The fix is to discard the YAML as the authoritative source and treat the live database schema as the ground truth instead.


Step 1: Schema-First Introspection

Instead of converting YAML to entities, we introspected the production database directly using Doctrine's SchemaManager and a custom generator script. The process:

  1. Point a Doctrine 3 Connection at a read-only replica of production.
  2. Call $connection->createSchemaManager()->introspectSchema() to get a Schema object representing every table, column, index, and foreign key as Doctrine 3 understands them.
  3. Run a custom generator (around 400 lines of PHP) that walks the Schema and emits entity classes with #[Entity], #[Table], #[Column], and #[ManyToOne]/#[OneToMany] attributes inferred from foreign key constraints.
  4. Output the entities into a src/Entity/Legacy/ namespace, separate from any new entities the team was building in parallel.

The critical difference from YAML conversion: every property type is derived from the actual column definition — not from a YAML file that might describe a type: integer column that the database is actually storing as VARCHAR(255) because someone did a data type change without a migration.

The generator also emitted a DISCREPANCIES.md file listing every place where the inferred schema conflicted with the existing YAML — 847 discrepancies in this codebase, including 14 missing tables, 62 columns with wrong types, and 9 foreign keys that existed in the database but were absent from the YAML. That document became the database of record for the migration team's backlog.


Step 2: The Dual-ORM Bridge

Generating correct entities is the easy part. The hard part is that 700,000 lines of production code reference Doctrine_Record subclasses directly, and you cannot rewrite all of it before the migration deadline.

The dual-ORM bridge is the solution. The idea is straightforward: run both Doctrine 1 and Doctrine 3 simultaneously inside the same Symfony application, with a routing layer that determines which ORM handles a given data access call.

The implementation has three components.

Connection isolation. Both ORMs connect to the same database, but through separate connection objects with separate configuration. Doctrine 1 uses its own Doctrine_Manager singleton as always; Doctrine 3 uses a standard Symfony EntityManager registered as a service. They do not share connection state.

Write coordination. The largest risk in a dual-ORM setup is that both ORMs maintain their own identity maps and unit of work caches. If Doctrine 1 writes a record and Doctrine 3 reads it back from cache before the transaction commits, you see stale data. We addressed this by marking all Doctrine 3 entities in the bridge period as non-cacheable at the second-level cache layer, and by always flushing Doctrine 1's connection before any Doctrine 3 read in the same request.

A migration flag per aggregate root. We added a migration_state column (with values legacy, in_progress, migrated) to every table that was being actively ported. New code checked this flag before deciding which ORM to use for a given record. This let us migrate data at the row level rather than the table level — high-value customer accounts could be moved to Doctrine 3 entities first, with older dormant accounts following in batch jobs.

The bridge added roughly 12 milliseconds of overhead per request on the busiest controllers because of the flag check and the explicit cache invalidation. That overhead was acceptable for a four-month window and was removed entirely at cutover.


Step 3: The Snapshot Test Harness

Migrating an ORM in a codebase with partial test coverage requires a safety net that does not depend on having good unit tests for the code being migrated. Database snapshot fixtures gave us that safety net.

The approach: before touching any controller or service class, we wrote a script that issued the full set of SQL queries that the code under migration produced against staging, captured the result sets as JSON fixtures, and stored them in tests/snapshots/. Each snapshot file contained the query, the parameters, and the expected result set.

During migration, instead of writing new unit tests (which would have required understanding the legacy business logic in detail), the test suite replayed every captured query against the Doctrine 3 entity layer and compared the result sets. A passing snapshot test meant that the Doctrine 3 implementation returned identical data to what the legacy Doctrine 1 code had returned.

This approach has limits — it catches query result regressions but not business logic bugs that existed before the migration. We were explicit with the client about this distinction. The snapshot tests were a migration fidelity guarantee, not a comprehensive functional test suite. Building that suite was a separate workstream that ran in parallel.


Step 4: The Cutover Sequence

The migration ran in four phases across four months, each phase targeting a slice of the application by domain rather than by technical layer.

Month 1: Reporting module. Read-heavy, no writes from user-facing flows, isolated from the transaction core. Low risk. We migrated 14 report generators to Doctrine 3 entities, ran the snapshot tests, deployed to staging, and held for two weeks of parallel operation where both the legacy and new code ran and their outputs were compared in production logs.

Month 2: User and account management. Medium complexity. The User, Organisation, and Subscription aggregates were the most thoroughly unit-tested parts of the legacy codebase, which made regression detection easier. The dual-ORM bridge's migration_state flag let us move accounts to Doctrine 3 incrementally starting with internal test accounts and expanding to production over three weeks.

Month 3: Billing and invoicing. The most sensitive domain. We ran an extended parallel period of six weeks rather than two, with a daily reconciliation job comparing Doctrine 1 and Doctrine 3 invoice totals for every account. Zero discrepancies were found after the first week once two type coercion bugs (decimal columns returning strings in an edge case involving zero-value line items) were fixed.

Month 4: Core product module and Doctrine 1 removal. The final batch included the product's primary workflow engine — the most complex code in the codebase. By this stage the team had four months of migration experience and the snapshot test suite covered 340 query patterns. The actual migration took three weeks; the final week was spent removing the Doctrine 1 dependency, the bridge layer, and the migration_state columns from every table.

The total result: Doctrine 1 removed from a 700K-line production codebase over four months, with no feature freeze, no production incidents requiring rollback, and a test suite that was meaningfully larger at the end than at the start.


What We Would Do Differently

Start snapshot capture earlier. We began capturing query snapshots two weeks into month one, after the introspection phase was complete. In retrospect, we should have started capturing on the first day we had read access to staging — even before any migration code was written. The more baseline snapshots you have, the cheaper the safety net becomes.

Automate the discrepancy backlog triage. The 847-item DISCREPANCIES.md file was triaged manually by a developer over the course of three days. A simple classification script that sorted discrepancies by table access frequency (derived from application logs) would have reduced triage time to half a day and ensured that the highest-traffic tables were addressed first.

Consider Doctrine 2 as an intermediate step for teams with less migration experience. We went directly from Doctrine 1 to Doctrine 3, which is the right choice for a greenfield migration but requires the team to absorb two major version changes simultaneously. For teams less familiar with the Doctrine 3 identity map and type coercion changes, a Doctrine 2 intermediate step reduces the learning surface — at the cost of running the migration twice.


Getting Started

If your codebase is still on Doctrine 1.x and the migration is on the roadmap but not yet started, the most valuable thing you can do this week is run the schema introspection against a staging replica and generate a discrepancy report. You do not need to commit to a migration timeline to do that. The report will tell you how far the YAML has drifted from the live database and give you the first realistic estimate of migration complexity.

If you need an experienced team to run the migration — or to review your approach before you start — wolf-tech.io specialises in exactly this kind of legacy PHP modernisation. We have done this migration on codebases ranging from 80K to 1.2M lines, and we can scope the work honestly based on what your schema introspection actually shows rather than optimistic estimates from YAML files that may not match your database.

Reach out at hello@wolf-tech.io — we are happy to start with a no-obligation schema review if that is useful.