Adding AI Features to an Existing Symfony or Next.js SaaS: A Retrofit Playbook
The product manager wants an AI summary button. The CTO wants a "smart assistant." Sales saw a competitor demo and now there is a board-level mandate to ship AI features this quarter. Sound familiar?
If you are trying to add AI features to an existing SaaS product — rather than building an AI-native app from scratch — you are in the majority. Most production SaaS codebases running in 2026 are mature Symfony or Next.js applications with years of business logic, established domain models, and real customers who cannot afford downtime. Retrofitting large language model (LLM) capabilities into that environment is a different engineering problem than greenfield AI development, and most tutorials skip the hard parts.
This playbook covers what we have learned working with clients who are in exactly this situation: where to draw the LLM boundary, how to keep non-deterministic outputs from leaking into deterministic systems, graceful degradation when your model provider is unavailable, streaming UX in the Next.js App Router, and how to keep AI outputs consistent with the rest of your data model.
Decide Where the LLM Boundary Lives Before Writing a Line of Code
The most consequential architectural decision when you add AI features to an existing SaaS is choosing where AI ends and the rest of your system begins. Get this wrong and you will spend months untangling LLM responses from business logic.
The principle to follow: LLM calls belong at the application edge, not in the domain model. Your domain model — Doctrine entities in Symfony, Prisma models in Next.js, your pricing rules, your permission system — has to remain purely deterministic. It should never call an LLM, parse an LLM response, or branch on an LLM output directly.
Instead, treat the LLM as an I/O adapter. In Symfony terms, this maps cleanly to a dedicated service layer: an AiSummaryService, a ContractAnalyzerService, a SmartSearchService. Each service calls the LLM, normalises the response into a typed DTO, validates it, and hands a clean, strongly-typed result to the domain. The domain never sees a raw completion.
In a Next.js context with the App Router, the same boundary exists between your Server Actions (or API route handlers) and your data access layer. Server Actions are a reasonable place to orchestrate an LLM call — they run on the server, they can stream, and they keep the LLM token out of the client. What they should not do is merge raw LLM output directly into your Prisma write. Parse, validate, and normalise first.
Isolating Non-Determinism from Established Business Logic
LLMs are not deterministic. Given the same input, they produce different output on different calls. This is a feature for creative tasks and a liability for transactional systems.
The antidote is a narrow, explicit contract between the AI layer and the rest of your application. Practically, that means structured output. Both the major providers (OpenAI, Anthropic) support JSON mode or tool-use patterns that constrain the model to emit a schema-valid response. Use them. Define the schema in code — a PHP readonly class in Symfony, a Zod schema in Next.js — and reject any response that does not validate against it.
In Symfony, a typical pattern looks like this: your AI service sends the prompt, specifies a JSON schema via the response_format parameter, deserialises the response into a typed DTO using the Symfony Serializer, then validates it with the Validator component. If validation fails, you have a retry budget (usually one or two retries) before you surface a graceful error. The domain object that ultimately gets written to your database is created from the DTO, not from the raw completion.
In Next.js, Zod is the natural fit. Parse the LLM response with z.safeParse(), handle the error case before you update any state, and never let an unvalidated completion touch your database writes or your React state tree.
The practical benefit beyond correctness is testability. Because your AI service returns typed DTOs, you can write unit tests that mock the AI service with deterministic fake responses. Your business logic tests stay fast, reproducible, and fully independent of whether the model API is reachable.
Graceful Fallback When the Model Provider Goes Down
Model provider outages are not hypothetical — they happen. If you have built AI features into a critical path of your product without a fallback strategy, you have introduced a hard dependency on a third party with a different SLA than your own. Your users will not be impressed.
For every AI feature, ask: what does the product do if this call fails or times out? The answers vary by feature type. An AI-generated summary that fails can fall back to showing the raw underlying data with a notice that summarisation is temporarily unavailable. An AI-powered search that fails can fall back to a deterministic keyword search. An AI classification step that fails can fall back to requiring manual classification — slower, but functional.
In Symfony, wrapping LLM calls in a circuit breaker (the symfony/http-client retry mechanism, or a library like Resiliency) and catching provider exceptions at the service boundary gives you a clean place to insert fallback logic. Log the failure with context (model, prompt hash, error code) so you can monitor provider reliability over time via your observability stack.
In Next.js, Server Actions should catch provider errors and return a structured error response that the UI handles explicitly — not an unhandled promise rejection that crashes the component tree. Consider a { success: false, fallback: true, data: ... } shape that signals to the UI that it should render the non-AI version of the component.
Avoid the temptation to simply hide AI features behind a feature flag and forget about fallbacks. Feature flags are useful for rollout control; they are not a substitute for graceful degradation.
Streaming UX in the Next.js App Router
Streaming is where AI features become perceptually fast. A response that takes eight seconds to generate is frustrating if the UI is blank for eight seconds. The same eight-second generation is tolerable — even engaging — if the text appears word by word as it streams.
The Next.js App Router and the streaming response primitives make this achievable without a third-party library, though the Vercel AI SDK significantly reduces boilerplate if you are comfortable adding the dependency.
The key architectural choice is between streaming from a Server Action and streaming from an API route handler. Server Actions are designed for mutations that return a final state. For streaming text, an API route handler (route.ts in the App Router) using ReadableStream is the cleaner model. The client fetches the stream, reads chunks with a ReadableStreamDefaultReader, and appends each chunk to component state as it arrives.
One important detail: streaming and Suspense boundaries interact in non-obvious ways. If you wrap a streaming component in a Suspense boundary, React will wait for the first chunk before exiting the fallback. This is usually what you want — it prevents a flash of empty state. But it means your stream needs to emit its first token quickly. Long server-side preprocessing before the first streamed token defeats the purpose of streaming.
For Symfony applications serving a Next.js frontend, streaming to a browser is achievable via Symfony's StreamedResponse on the backend and a standard fetch stream reader on the Next.js side. Keep the streaming boundary at the HTTP level. The Symfony backend streams raw text or newline-delimited JSON; the Next.js client reads and renders it. Do not try to stream inside Symfony's event loop — let the HTTP layer do the work.
Keeping AI Outputs Reconciled with the Rest of the System
A subtler problem surfaces weeks after launch: AI-generated content drifts out of sync with the underlying data it was generated from. The user updates a record. The AI summary in the sidebar still reflects the old version. The AI-generated tag is now incorrect because the category taxonomy changed.
The standard fix is to version AI outputs and re-trigger generation when source data changes. In Symfony, this works well with Doctrine lifecycle events or Messenger messages: when an entity changes, dispatch an event that schedules a regeneration job. Store the AI output with a reference to the source entity version or a content hash, and display a stale indicator when the hash no longer matches.
In Next.js with a Prisma backend, a similar approach uses application-level hooks on mutation paths. When a record updates, invalidate the associated AI output and enqueue regeneration. The UI can show the last-generated output with a "Refreshing…" indicator while new generation runs in the background.
This is also where caching strategy matters. Avoid caching raw LLM completions — cache the parsed, validated DTO or the final stored output. Raw completion caching couples you to the exact prompt format and breaks silently when you refine your prompts.
A Note on Prompt Management in a Mature Codebase
One aspect of AI integration that mature codebases feel acutely is prompt management. Prompts are code. They belong in version control, they need to be tested when they change, and changes to them should go through the same review process as code changes.
Avoid embedding prompts as raw string literals scattered through your service classes. Instead, define them as named constants in a dedicated location, reference them by name, and log the prompt name (not the full prompt text) alongside each LLM call in your structured logs. This makes it tractable to correlate a change in AI output quality with a specific prompt change in your Git history.
In Symfony, a PromptRegistry service that loads prompts from YAML configuration (with env var interpolation for dynamic values) is a clean approach that keeps prompts editable without a code deploy. In Next.js, a similar pattern using environment-variable-backed constants or a lightweight CMS for prompt storage gives you operational flexibility.
Where to Start
Adding AI features to an existing SaaS is engineering work, not a prompt engineering exercise. The architectural decisions — where the LLM boundary lives, how you handle non-determinism, what your fallback strategy is, how you manage staleness — determine whether a feature holds up in production or creates a new class of reliability problems.
If you are evaluating where to start, pick one narrow, low-stakes feature (a summary, a classification, a draft generator for a form field) and build the full stack correctly: structured output, validation, fallback handling, and staleness management. Get that right on a single feature before expanding AI coverage across the product.
If your Symfony or Next.js codebase needs experienced hands for this kind of work — whether that is an architecture review before you begin, design sessions to define the LLM boundary, or implementation support — Wolf-Tech works with SaaS teams at exactly this stage. We have helped teams retrofit AI capabilities into complex PHP and React codebases without disrupting what is already working. Get in touch at hello@wolf-tech.io to discuss your situation.

