Integrate OpenAI Into a PHP Application: A Production-Grade Symfony Pattern
The first version almost always looks the same. Someone drops the official SDK into a controller, calls $client->chat()->create() inline, echoes the result, and ships it. It demos beautifully. Then it meets production: a request hangs for nine seconds because the model was slow, the API key turns up in a stack trace, nobody can answer how much last month cost, and a provider hiccup takes a checkout flow down with it.
If you want to integrate OpenAI into a PHP application that survives real traffic, the SDK call is the easy 10 percent. The other 90 percent is everything around it: where the call lives, how it fails, how you keep it off the request thread, and how you know what it is costing you. This post lays out the pattern we use on client Symfony codebases, with the architectural decisions explained rather than just the code.
Keep the Model Call at the Edge, Never in the Domain
The single most important decision happens before you write a line of integration code: the LLM call belongs at the application edge, not inside your domain model. Your Doctrine entities, your pricing rules, your permission checks have to stay purely deterministic. They should never call OpenAI, parse a completion, or branch on model output directly.
Treat OpenAI as an I/O adapter, exactly like you would treat a payment gateway or an email provider. In Symfony this maps to a dedicated service with a narrow interface. Define an interface first so the rest of the application depends on the abstraction, not the vendor:
interface SummaryGenerator
{
public function summarize(string $input): SummaryResult;
}
The concrete OpenAiSummaryGenerator is the only class in your codebase that knows OpenAI exists. It builds the request, calls the API, normalizes the response into a typed SummaryResult DTO, and hands that clean object back. Everything upstream depends on SummaryGenerator. The payoff is immediate: you can swap providers, mock the interface in tests, and stop a vendor outage from rippling into your business logic. When clients ask us to review an AI integration during a code quality assessment, a model call buried inside an entity method is the first thing we flag.
Constrain the Output With a Schema, Then Validate It
OpenAI is not deterministic. The same prompt yields different text on different calls, which is fine for a creative draft and dangerous the moment that output touches a database write. The antidote is a strict contract between the model and the rest of your system.
Use structured output. The Chat Completions and Responses APIs both support a JSON schema response format that constrains the model to emit schema-valid JSON. Define that schema in code as a readonly DTO, deserialize the response with the Symfony Serializer, then run it through the Validator component before anything else touches it:
$payload = $this->client->chat()->create([
'model' => 'gpt-4.1-mini',
'messages' => $messages,
'response_format' => ['type' => 'json_schema', 'json_schema' => $schema],
]);
$dto = $this->serializer->deserialize(
$payload->choices[0]->message->content,
SummaryResult::class,
'json'
);
$violations = $this->validator->validate($dto);
if (count($violations) > 0) {
throw new InvalidModelOutput($violations);
}
Give yourself a small retry budget, usually one or two attempts, before surfacing a graceful error. The object that eventually gets persisted is built from the validated DTO, never from the raw completion. This also makes the service testable: because it returns a typed result, your business-logic tests can mock it with deterministic fakes and stay fast and reproducible regardless of whether the API is reachable.
Get the Call Off the Request Thread
A synchronous OpenAI call inside a web request is the defect we see most often. Model latency is routinely two to ten seconds, it is unpredictable, and PHP-FPM has a finite worker pool. Tie up enough workers waiting on a slow model and the whole site stops responding, including pages that have nothing to do with AI.
For anything that is not strictly interactive, move the call to a background worker. Symfony Messenger is built for exactly this. The controller dispatches a message and returns immediately; a worker consuming the queue does the slow work:
public function requestSummary(Document $document, MessageBusInterface $bus): JsonResponse
{
$bus->dispatch(new GenerateSummary($document->getId()));
return new JsonResponse(['status' => 'queued'], 202);
}
The handler runs in a separate process, calls OpenAI, persists the validated result, and notifies the client through Mercure, a webhook, or simple polling. Web workers stay free, slow generations no longer block unrelated traffic, and Messenger gives you retries with backoff and a failure transport for free. Configure a sane max_retries and a dedicated failed transport so a transient 429 retries automatically while a genuinely broken message lands somewhere you can inspect it.
When the feature genuinely is interactive, such as a chat box where the user is watching, stream instead. Expose the call through a StreamedResponse and let the OpenAI stream flush token by token. Perceived latency drops dramatically because text appears as it generates rather than after a multi-second blank pause. Streaming and background processing are not in competition; you pick per feature based on whether a human is actively waiting on the result.
Build Real Error Boundaries
A model call has more failure modes than most HTTP calls: timeouts, 429 rate limits, 500s from the provider, content filter refusals, and malformed JSON that slips past structured output. Each one needs a defined behavior, decided in advance, not discovered in an incident.
Wrap the call at the service boundary and translate vendor exceptions into your own domain exceptions, so nothing downstream catches a raw OpenAI error class. For every AI feature, answer one question before launch: what does the product do when this call fails? A summary that fails can fall back to showing the underlying data with a quiet "summary unavailable" note. An AI search that fails can fall back to deterministic keyword search. A classification step that fails can fall back to a manual queue, slower but functional. The unacceptable answer is an unhandled exception reaching the user.
Set an explicit timeout on the HTTP client, shorter than your worker timeout, and treat a slow model as a failure you can fall back from rather than a thread you let hang indefinitely. A circuit breaker in front of the provider keeps a sustained outage from generating thousands of doomed retries; once the breaker opens, you serve the fallback path directly until the provider recovers.
Protect the Key, Scope the Access
The API key is a billing credential. Leaking it means someone else spends your money, so it never appears in code, never in a frontend bundle, and never in a log line. Keep it in Symfony's secrets vault or an injected environment variable, and route every call through your server so the key stays out of any client the browser can read.
Multi-tenant systems need a second layer. If tenants can trigger model calls, attribute every request to a tenant and enforce per-tenant rate and spend limits. Without that, one tenant, abusive or simply buggy, can exhaust a shared quota and degrade the feature for everyone. Tenant scoping at the AI boundary is the same discipline you already apply to database queries, just extended to a metered external resource.
Instrument Cost and Usage From Day One
"How much is this costing us?" is a question that arrives the moment AI hits a real invoice, and teams that did not plan for it cannot answer. Every OpenAI response includes a usage object with prompt and completion token counts. Capture it on every call.
Log per request the model, prompt and completion tokens, latency, the calling feature, and the tenant where relevant. Persist it to a table or push it to your metrics backend. That record answers which features cost the most, which tenants drive spend, whether a prompt change moved token usage, and when latency is creeping up. It also lets you set spending alerts before a runaway loop produces a five-figure surprise. Observability is not a nice-to-have you add later; it is the difference between a feature you control and one that controls your budget. We treat usage instrumentation as a launch requirement on every integration we ship through custom software development, not a follow-up ticket.
Putting the Pattern Together
A production-grade OpenAI integration in Symfony has a recognizable shape. The call lives behind an interface at the edge. The output is schema-constrained and validated into a typed DTO before it touches your data. Slow calls run through Messenger so they never block web workers, while genuinely interactive ones stream. Every failure mode has a defined fallback. The key stays server-side and access is tenant-scoped. And every call is instrumented for cost and latency.
None of this is exotic. It is the same engineering discipline you already apply to payment gateways and email providers, pointed at a slower, non-deterministic, metered dependency. The teams that struggle are the ones that treat the model as a special case and skip the boundaries they would never skip elsewhere. Treat OpenAI as ordinary infrastructure with unusual latency and variance, and the integration becomes maintainable.
If you are retrofitting AI into an existing system, the same principles apply with extra care around where the boundary sits; our retrofit playbook goes deeper on that case. And if you would rather have a second set of eyes on an integration before it ships, that is exactly the kind of work we do. Reach us at hello@wolf-tech.io or at wolf-tech.io, and we will help you get it production-ready.

