LLM Agents on the Symfony Workflow Component: State Machines, Idempotent Steps, and Replayable Failures

#Symfony Workflow LLM agent
Sandor Farkas - Founder & Lead Developer at Wolf-Tech

Sandor Farkas

Founder & Lead Developer

Expert in software development and legacy code optimization

A German logistics SaaS company shipped an LLM-powered document processing agent last winter. The agent extracted structured data from freight invoices, validated fields against a rules engine, and pushed records into their ERP. It worked in staging. In production, it started failing on day two — not because the model made wrong extractions, but because the HTTP call to the ERP sometimes timed out after the agent had already written to a staging table, leaving the system in a half-committed state with no clean way to replay. The team had built a capable model pipeline and a fragile orchestration layer. Those are two separate problems, and only one of them is the model's fault.

The Symfony Workflow Component was designed for exactly the orchestration half. Originally built to model business processes — order state machines, document approval flows, subscription lifecycles — it turns out to be a strong fit for LLM agent loops. Explicit places and transitions give you replayable state. Guards let you enforce preconditions and policy. The event system doubles as a structured audit log. Symfony Messenger handles retries, backpressure, and dead-letter queues. This post walks through a complete pattern for building a Symfony Workflow LLM agent that can survive real production failures.

Why Agent Orchestration Is a Harder Problem Than It Looks

The popular framing of LLM agents treats them as loops: observe → reason → act → observe. That framing is accurate for the happy path. It obscures the production reality.

Real agent steps have side effects — writes to databases, calls to external APIs, mutations to shared state. When a step fails mid-execution, you need answers to questions the loop framing ignores: Has the side effect already happened? Is it safe to retry? Is the agent's understanding of current state still accurate? Which step do you re-run, and from what preconditions?

Agentic features in PHP applications face the same failure modes as any distributed workflow. The model response is just one I/O call among many. Treating the whole loop as a single synchronous transaction is incorrect; treating each step as fire-and-forget is also incorrect. What you want is explicit state, explicit transitions, and deterministic recovery. That is what state machines give you, and it is why the Symfony Workflow Component maps cleanly onto agent orchestration.

The Core Abstraction: Agent Step as Workflow Transition

In Symfony Workflow, an entity moves through places via transitions. A transition fires only when the entity is in the right place, its guards pass, and you explicitly apply it. The component tracks state on the entity and dispatches events before, during, and after each transition.

Map that onto an agent loop: the agent's run object is the entity. Each tool call or reasoning step is a transition. The set of valid places describes where the agent can legitimately be at any point in time.

Here is a simplified YAML definition for a document processing agent:

framework:
  workflows:
    invoice_agent:
      type: state_machine
      marking_store:
        type: method
        property: status
      supports:
        - App\Agent\InvoiceAgentRun
      initial_marking: created
      places:
        - created
        - fetching_document
        - extracting_fields
        - validating_fields
        - writing_to_erp
        - completed
        - failed
      transitions:
        fetch:
          from: created
          to: fetching_document
        extract:
          from: fetching_document
          to: extracting_fields
        validate:
          from: extracting_fields
          to: validating_fields
        write:
          from: validating_fields
          to: writing_to_erp
        complete:
          from: writing_to_erp
          to: completed
        fail:
          from: [fetching_document, extracting_fields, validating_fields, writing_to_erp]
          to: failed

The entity class stores the current place, the run inputs, and any intermediate outputs:

namespace App\Agent;

class InvoiceAgentRun
{
    public string $status = 'created';
    public array $extractedFields = [];
    public ?string $erpRecordId = null;
    public ?\DateTimeImmutable $startedAt = null;
    public ?\DateTimeImmutable $completedAt = null;
    public ?string $failureReason = null;

    public function __construct(
        public readonly string $id,
        public readonly string $documentUrl,
        public readonly string $tenantId,
    ) {}
}

The orchestrator applies transitions in sequence, persisting state after each one. If the process dies between extracting_fields and validating_fields, you know exactly where you were and can resume — or hand off to a human reviewer — without guessing.

Making Steps Idempotent

Knowing where you are is only half the problem. The other half is ensuring that re-running a step from a known state produces the same outcome without duplicate side effects.

For read-only steps — fetching a document, calling the model — idempotency is usually free. Call them again if you need to. For write steps, you need an explicit mechanism to prevent double-writes on retry.

The standard pattern is an outbox table. Before applying a transition that has an external side effect, insert a record into the outbox inside the same database transaction that updates the entity's marking. A separate worker reads the outbox and performs the external call, marking the record as sent once the call succeeds. If the worker crashes mid-flight, the outbox record is still there on restart. If the external system receives the same idempotency key twice, it ignores the duplicate.

// Inside the transition listener for 'write'
public function onWrite(TransitionEvent $event): void
{
    $run = $event->getSubject();

    $this->entityManager->wrapInTransaction(function () use ($run) {
        // Persist the updated marking (writing_to_erp)
        $this->entityManager->persist($run);

        // Queue the ERP write via outbox — same transaction
        $outbox = new ErpOutboxEntry(
            runId: $run->id,
            idempotencyKey: "erp-write-{$run->id}",
            payload: $run->extractedFields,
        );
        $this->entityManager->persist($outbox);
    });
}

The outbox worker is a Symfony Messenger consumer. If the ERP call fails, Messenger retries with backoff. If it exhausts retries, the message lands in the dead-letter queue and you can inspect it, correct the payload, and replay manually.

The LLM Call as a Guarded Transition

Model calls deserve their own treatment. They are expensive, non-deterministic, and slow. You want to fire them at most once per step and cache results so that a retry does not re-invoke the model if you already have a valid response.

Guards are the right place to check whether a step should be retried at all:

use Symfony\Component\Workflow\Event\GuardEvent;

public function guardExtract(GuardEvent $event): void
{
    $run = $event->getSubject();

    // Block transition if we already have a cached extraction
    if (!empty($run->extractedFields)) {
        $event->setBlocked(true, 'Extraction already complete — skipping.');
    }
}

The actual model call lives in a transition listener. Store the raw model response on the entity alongside the parsed fields — you want the eval trace, not just the structured output:

public function onExtract(TransitionEvent $event): void
{
    $run = $event->getSubject();

    $response = $this->llmClient->extract(
        document: $run->documentContent,
        schema: InvoiceFieldSchema::get(),
        model: 'claude-sonnet-4-6',
    );

    $run->rawLlmResponse = $response->raw;
    $run->extractedFields = $response->parsed;
    $run->extractionTokens = $response->usage;
}

Because the entity is persisted after each transition, rawLlmResponse and extractedFields survive a crash. A replay picks up from extracting_fields, the guard sees non-empty extractedFields, blocks the transition, and you move on to validation without touching the model again — which matters both for cost and for consistency across retries. This is one of the more practical benefits of building LLM tool use on Symfony's structured orchestration layer.

Using Messenger for the Outer Loop

The agent loop itself should be asynchronous. Dispatching each transition step as a Messenger message separates the agent's execution from HTTP request cycles, gives you per-step backoff and retry, and means the whole run survives an application restart.

namespace App\Agent\Message;

final class ApplyAgentTransition
{
    public function __construct(
        public readonly string $runId,
        public readonly string $transition,
    ) {}
}

The handler loads the run, checks that the transition is still applicable (the workflow enforces this via can()), applies it, persists, and dispatches the next transition:

public function __invoke(ApplyAgentTransition $message): void
{
    $run = $this->runRepository->find($message->runId);

    if (!$this->workflow->can($run, $message->transition)) {
        // Already advanced — idempotent no-op
        return;
    }

    $this->workflow->apply($run, $message->transition);
    $this->entityManager->flush();

    $nextTransition = $this->transitionResolver->next($run);
    if ($nextTransition !== null) {
        $this->bus->dispatch(new ApplyAgentTransition($run->id, $nextTransition));
    }
}

Retry configuration in messenger.yaml handles transient failures — model timeouts, ERP rate limits, network blips — without any special-casing in the handler:

framework:
  messenger:
    failure_transport: failed
    transports:
      async:
        dsn: '%env(MESSENGER_TRANSPORT_DSN)%'
        retry_strategy:
          max_retries: 5
          delay: 2000
          multiplier: 2
          max_delay: 30000

Five retries with exponential backoff handles most transient failures. Anything beyond that lands in the failed transport, where you can inspect, correct, and replay selectively. The workflow's state on the entity tells you exactly which step failed, so replaying a dead-lettered message starts from precisely the right place.

The Audit Log as an Eval Trace

One underrated benefit of this pattern is that the Workflow component's event system produces a structured audit trail for free. Every entered, completed, and announce event carries the entity, the transition name, and a timestamp. Persist them:

public function onEntered(EnteredEvent $event): void
{
    $run = $event->getSubject();
    if (!$run instanceof InvoiceAgentRun) {
        return;
    }

    $this->entityManager->persist(new AgentRunEvent(
        runId: $run->id,
        event: 'entered',
        place: current($event->getMarking()->getPlaces()),
        transition: $event->getTransition()?->getName(),
        occurredAt: new \DateTimeImmutable(),
        metadata: [
            'extractionTokens' => $run->extractionTokens ?? null,
        ],
    ));
}

This log is your eval dataset. When output quality degrades — extraction accuracy drops, validation false-positive rates climb — you can query by place, filter to failed runs or low-confidence responses, and export the raw LLM response plus ground truth for each case. Evals that feed on production data require this kind of systematic trace. A workflow component that runs every step through an event system gives you it without additional instrumentation effort.

Connecting to Real Models

Nothing in the pattern above is model-specific. The LlmClient interface takes a prompt and a schema and returns structured output. Wire it to Claude, GPT-4o, or a local Mistral instance — the orchestration layer does not care.

For Claude integration in PHP, the Anthropic API is a straightforward HTTP call. We typically wrap it in a service that handles retries at the HTTP level, separate from Messenger retries at the step level — two layers of retry with different semantics:

final class AnthropicLlmClient implements LlmClient
{
    public function extract(string $document, array $schema, string $model): LlmResponse
    {
        $response = $this->httpClient->request('POST', 'https://api.anthropic.com/v1/messages', [
            'headers' => [
                'x-api-key' => $this->apiKey,
                'anthropic-version' => '2023-06-01',
            ],
            'json' => [
                'model' => $model,
                'max_tokens' => 1024,
                'tools' => [$schema],
                'messages' => [
                    ['role' => 'user', 'content' => $document],
                ],
            ],
        ]);

        return LlmResponse::fromAnthropicToolUse($response->toArray());
    }
}

For teams running multiple providers or falling back to local models, this interface is the only seam that changes. The state machine, the outbox pattern, the retry configuration — all of that stays the same regardless of which model sits behind the extraction step. See our services for how we help teams design and implement these AI integration patterns in production.

What This Pattern Does Not Cover

The Symfony Workflow Component is excellent for sequential agent loops with clear step boundaries. It is less natural for agents that need to branch dynamically mid-loop — choosing between five different tool paths based on model output — or for multi-agent orchestration where sub-agents run concurrently and their results need to be merged. For those cases, a more flexible graph-based orchestration layer is worth the added complexity.

The pattern also does not address prompt injection, model evaluation, or the cost accounting you need once token spend becomes a line item. Those are real production concerns; the orchestration layer is a prerequisite, not a complete solution.

Getting This Running

The minimal setup for this pattern on a Symfony 7 project:

composer require symfony/workflow symfony/messenger doctrine/orm

Define your workflow in config/packages/workflow.yaml, create your entity and outbox table, write the transition listeners, configure Messenger with a real transport (Doctrine or Redis both work well), and deploy a worker alongside your application. The workflow visualizer (bin/console workflow:dump invoice_agent | dot -Tpng > agent.png) gives you a diagram you can share with the non-engineering stakeholders who also have opinions about the agent's behaviour.

If your team is looking at LLM agents in a PHP codebase and wants a pattern that survives the first contact with production, this is a solid starting point. Reach out at hello@wolf-tech.io or visit wolf-tech.io — we help teams design and implement agentic features in Symfony and Next.js applications that hold up under real load, real failures, and real finance departments asking pointed questions about AI costs.