Symfony Messenger at Scale: Backpressure, Poison-Pill Handling, and the Rate-Limit Patterns That Save Your Database

#Symfony Messenger production
Sandor Farkas - Founder & Lead Developer at Wolf-Tech

Sandor Farkas

Founder & Lead Developer

Expert in software development and legacy code optimization

The setup is familiar. You wire up Symfony Messenger for async processing — webhook receivers, email dispatch, background sync jobs — and in development it works exactly as advertised. Then you deploy to production, a payment provider sends a burst of 8,000 webhook events across five minutes, and your monitoring shows database connections exhausted, worker processes spinning in failure loops, and an alert queue that will take hours to drain.

What happened? One slow downstream API turned a straightforward message handler into a backpressure amplifier. Retry logic designed for occasional failures kicked in at scale and saturated the connection pool. A single malformed event — the kind your schema validator never anticipated — got retried thousands of times while legitimate messages piled up behind it.

These are not edge cases. They are the predictable failure modes of Messenger deployments that have not yet been hardened for production scale. The patterns below are what we install in client systems handling 1M+ messages per day.

The Core Problem: Retry Amplification and Transport Back-Pressure

Symfony Messenger's default retry configuration is generous: three attempts with a multiplier of two, starting at one second. For systems processing tens of messages per minute, this is fine. For systems under sustained load with a slow dependency — an external API at 400ms average latency, a Doctrine listener that fires a secondary query on every consumed message — retries do not smooth the load, they amplify it.

The compound effect is a pattern we call retry amplification: one bad message triggers three retry cycles, each retry occupies a worker process for the full retry delay, and during that time other workers continue pulling fresh messages. Under high ingest rates, you can saturate the worker pool before the dead-letter threshold is ever reached.

The fix has two parts: rate limiting at the transport level so that workers self-throttle based on downstream health, and fast failure routing that removes bad messages from the main queue before they consume retry budget.

Per-Transport Rate Limiting with Exponential Backoff and Jitter

Symfony Messenger does not ship a built-in transport rate limiter, but the RateLimiterFactory from the symfony/rate-limiter component integrates cleanly via a custom MessageHandlerInterface wrapper — or more precisely, via a middleware that inspects message stamps before dispatch.

The pattern that works well in production is a handler-level rate limiter that blocks the current worker process rather than re-queuing the message:

final class RateLimitedHandlerMiddleware implements MiddlewareInterface
{
    public function __construct(
        private readonly RateLimiterFactory $externalApiLimiter,
        private readonly LoggerInterface $logger,
    ) {}

    public function handle(Envelope $envelope, StackInterface $stack): Envelope
    {
        if ($envelope->getMessage() instanceof ExternalApiMessage) {
            $limiter = $this->externalApiLimiter->create('global');
            
            if (!$limiter->consume(1)->isAccepted()) {
                // Block this worker until a token is available
                $limiter->reserve(1)->wait();
                $this->logger->debug('Rate limit applied, worker paused');
            }
        }

        return $stack->next()->handle($envelope, $stack);
    }
}

Register the rate limiter in config/packages/rate_limiter.yaml:

framework:
    rate_limiter:
        external_api:
            policy: token_bucket
            limit: 100
            rate: { interval: '1 minute', amount: 100 }

This caps a given transport at 100 messages per minute regardless of how many worker processes are running. When combined with supervisor's worker count, you can express your external API SLA directly in configuration rather than hoping the downstream survives whatever burst arrives.

For exponential backoff with jitter — critical when the downstream is recovering from an outage and you want to avoid a thundering herd on reconnect — override the default retry strategy per transport:

framework:
    messenger:
        transports:
            external_api:
                dsn: '%env(REDIS_DSN)%'
                retry_strategy:
                    max_retries: 5
                    delay: 1000
                    multiplier: 2
                    max_delay: 60000
                    jitter: 0.2

The jitter: 0.2 adds up to 20% random variance to each retry delay interval. With a multiplier of 2 starting at one second, retry slots fall at roughly 1s, 2s, 4s, 8s, and 16s — with enough spread that a hundred workers retrying simultaneously do not all hammer the recovered service at the same instant.

Dead-Letter Queues with Structured Retry Classification

The default Messenger dead-letter setup routes all failed messages to a single failed transport, which creates a flat queue where a schema validation failure sits next to a network timeout and a business logic exception — three problems with completely different recovery paths.

The pattern we install is a multi-tier dead-letter system driven by exception type:

final class ClassifiedFailureHandler implements MiddlewareInterface
{
    public function handle(Envelope $envelope, StackInterface $stack): Envelope
    {
        try {
            return $stack->next()->handle($envelope, $stack);
        } catch (TransientExternalApiException $e) {
            // Retry with backoff — re-stamp and re-dispatch
            throw new RecoverableMessageHandlingException($e->getMessage(), previous: $e);
        } catch (PermanentValidationException $e) {
            // Route to validation DLQ immediately, do not retry
            throw new UnrecoverableMessageHandlingException($e->getMessage(), previous: $e);
        }
        // All other exceptions use the default retry strategy
    }
}

Marking an exception as UnrecoverableMessageHandlingException causes Messenger to skip the retry strategy entirely and route the envelope directly to the failure transport. Marking it as RecoverableMessageHandlingException signals that retries should proceed.

The transport configuration splits DLQ destinations by failure class:

framework:
    messenger:
        failure_transport: failed_default
        transports:
            failed_default:
                dsn: '%env(REDIS_DSN)%/failed:default'
            failed_validation:
                dsn: '%env(REDIS_DSN)%/failed:validation'

Combined with a custom stamp that records the original exception class and attempt count, your ops team has a DLQ where every entry is actionable: validation failures go to a data-fix workflow, transient failures can be bulk-retried with messenger:failed:retry, and permanent infrastructure errors get a separate alert threshold.

Doctrine Connection Pool Tuning for Worker Pools

This is the failure mode that most teams discover last, usually under unexpected load. Each Messenger worker process holds at least one Doctrine connection open for the duration of its life. With the default Doctrine DBAL configuration and a typical PostgreSQL max_connections of 100, ten supervisor workers with three connections each and a concurrent web server pool already consume sixty connections before any burst traffic arrives.

The Doctrine Messenger middleware that auto-opens connections on handler execution makes the problem worse: if your handler fires five queries and PostgreSQL is under contention, the middleware will keep the connection checked out for the full handler duration — blocking other workers waiting for a connection from a pool that is already full.

The fix is three-pronged. First, add the doctrine middleware to each transport to ensure that connections are properly closed between messages:

framework:
    messenger:
        buses:
            messenger.bus.default:
                middleware:
                    - doctrine_transaction
                    - doctrine_ping_connection
                    - doctrine_close_connection

The doctrine_close_connection middleware explicitly closes the connection after each message is processed and returns it to the pool. Without this, long-running workers accumulate idle connections that PostgreSQL counts against max_connections but Doctrine considers "available."

Second, cap per-worker connection lifetime in config/packages/doctrine.yaml:

doctrine:
    dbal:
        connection_factory:
            sslmode: 'prefer'
        options:
            !php/const PDO::ATTR_PERSISTENT: false
        keepSlave: false

Third, run this query on your PostgreSQL instance to surface the actual connection distribution before you ever tune supervisor worker counts:

SELECT application_name, state, count(*) 
FROM pg_stat_activity 
WHERE datname = current_database()
GROUP BY application_name, state
ORDER BY count DESC;

If you see a large number of idle connections from your worker application name, doctrine_close_connection is not configured, or your pool ceiling is too high. If you see idle in transaction, a handler is holding a transaction open across a slow network call — isolate that handler and defer the Doctrine operation until after the external call completes.

Transport Selection: In-Memory vs Redis vs AMQP Under Load

For production systems handling sustained volume, the transport selection matters beyond the obvious durability trade-off.

In-memory transport is appropriate only for development and integration tests. Under any production load it drops messages on worker restart and provides no visibility into queue depth.

Redis transport is the right default for most Symfony teams. It offers persistence (with appendonly yes), fast throughput, and a native queue-depth metric that integrates cleanly with Grafana. For most systems below 50,000 messages per minute, Redis saturates last — the bottleneck is almost always the handler logic or downstream dependencies.

AMQP (RabbitMQ) adds value when you need fan-out routing — one message to multiple consumers — or when you want exchange-level dead-lettering with TTL-based retry scheduling at the broker level rather than the application level. The Symfony AMQP transport supports dead-letter exchanges natively:

framework:
    messenger:
        transports:
            async:
                dsn: '%env(AMQP_DSN)%'
                options:
                    exchange:
                        name: app
                        type: direct
                    queues:
                        messages:
                            arguments:
                                x-dead-letter-exchange: app.dlx
                                x-message-ttl: 30000

For systems where a Redis outage would cause unacceptable data loss on the queue, AMQP with mirrored queues provides stronger durability guarantees. For most B2B SaaS applications, Redis with persistence enabled is sufficient and operationally simpler.

Supervisord Configuration for Production Worker Pools

Getting worker count right requires knowing your message processing time distribution, not just the average. A handler with a p99 latency of 800ms at the same worker count as a handler with a p99 of 50ms will produce completely different queue depth behaviour under burst load.

A supervisord configuration that accounts for these differences uses separate programs per logical message category:

[program:messenger-default]
command=php /var/www/html/bin/console messenger:consume async --time-limit=3600 --memory-limit=256M
numprocs=4
autostart=true
autorestart=true
stdout_logfile=/var/log/supervisor/messenger-default.log
stopwaitsecs=60

[program:messenger-external-api]
command=php /var/www/html/bin/console messenger:consume external_api --time-limit=3600 --memory-limit=256M
numprocs=2
autostart=true
autorestart=true
stdout_logfile=/var/log/supervisor/messenger-external-api.log
stopwaitsecs=60

The --time-limit=3600 causes each worker to gracefully restart after an hour. This is essential to prevent memory creep in long-running PHP processes — without a time limit, handlers that accumulate objects in memory across thousands of messages will eventually exhaust the --memory-limit mid-message.

The stopwaitsecs=60 gives workers time to finish their current message before supervisor sends SIGKILL. Without it, supervisor may kill a worker mid-transaction, leaving a message in an ambiguous state.

For systems where message processing time is highly variable — jobs that occasionally run for ten minutes alongside jobs that complete in milliseconds — consider separate transport queues and supervisor programs for long-running handlers. This prevents slow jobs from blocking the backlog of fast ones, which is the async equivalent of head-of-line blocking.

The Patterns That Compound

Individually, each of these patterns addresses one failure mode. Combined, they give you a worker pool that self-throttles when downstream systems are under pressure, fails fast on unrecoverable messages, routes failures to actionable queues, and does not consume database connections it is not actively using.

If your Messenger deployment is showing signs of stress — growing queue depth under load, unexplained database connection spikes, or worker processes cycling on retry loops — a code quality audit is often the fastest way to identify which pattern is missing and what the remediation priority should be. These configurations take an afternoon to implement correctly; diagnosing them in a production incident takes considerably longer.

Reach out at hello@wolf-tech.io or visit wolf-tech.io if you want to discuss your Messenger architecture before the next incident does it for you.