Prometheus + Grafana + OpenTelemetry + Sentry: The Integration Checklist for a Production-Ready Observability Stack
Most observability problems are not tool problems. They are integration problems. You have Prometheus scraping metrics, OpenTelemetry emitting traces, Loki collecting logs, and Sentry catching exceptions — but each system lives in its own silo. An alert fires at 2 AM, and you spend twenty minutes copy-pasting trace IDs between four browser tabs trying to build a picture that your tooling should have assembled for you.
This post gives you the prometheus grafana opentelemetry sentry integration checklist that actually connects those four systems. It covers collector configuration, SDK instrumentation for Symfony and Next.js, Grafana datasource wiring, alert routing from Alertmanager to Slack and PagerDuty, and the three integration mistakes that generate phantom alerts or — worse — silent failures.
The Four-Layer Stack and How the Pieces Fit
Before diving into the checklist, a quick map of responsibilities:
- Prometheus collects and stores time-series metrics. It scrapes
/metricsendpoints on a schedule and evaluates alerting rules. - Grafana visualises everything. It acts as a unified frontend for Prometheus, Loki, Tempo, and — via the Sentry datasource plugin — your error volumes.
- OpenTelemetry (OTel) is the instrumentation layer. The OTel SDK in your application emits traces, metrics, and logs. The OTel Collector receives them and routes them to the right backends (Tempo for traces, Loki for logs, Prometheus remote-write for metrics).
- Sentry catches errors, groups them into issues, and tracks releases. Its strongest suit is stacktraces with source-context and release-level regression detection.
The integration goal is correlation: when a Grafana alert fires on a Prometheus metric, you should be able to click through to a Tempo trace from the same time window, see the related Loki log lines, and land on the Sentry issue that contains the stacktrace. Each section below moves you one step closer to that.
Phase 1: Prometheus and the OTel Collector
Collector Configuration
The OTel Collector is the central routing hub. Run it as a sidecar or a dedicated service; the latter scales better.
Checklist:
- Deploy
otelcol-contrib(not the core build — you need the Prometheus exporter and Loki exporter receivers) - Configure a
prometheusreceiver on port9464to scrape your application's/metricsendpoint - Configure an
otlpreceiver (gRPC on4317, HTTP on4318) for traces and logs from the SDK - Add a
batchprocessor withsend_batch_size: 1024andtimeout: 5sto avoid hammering backends with tiny payloads - Add a
memory_limiterprocessor set to 80% of the container's memory limit — the Collector will OOM in production without it - Route trace data to an
otlpexporter pointing at Tempo - Route log data to a
lokiexporter - Route metrics to a
prometheusremotewriteexporter pointing at your Prometheus instance (or Mimir / Thanos if you're scaling horizontally)
The most common mistake here: using prometheusremotewrite and also leaving a separate Prometheus scrape job for the same service. You will see duplicate time series and erratic rate calculations. Pick one ingestion path.
Prometheus Scrape Configuration
A minimal scrape config that avoids label collisions:
scrape_configs:
- job_name: 'otel-collector'
static_configs:
- targets: ['otel-collector:8888']
- job_name: 'your-app'
static_configs:
- targets: ['app:9464']
honor_labels: true
scrape_interval: 15s
Confirm the Prometheus /targets page shows all jobs as UP before moving to alerting rules.
Phase 2: SDK Instrumentation in Symfony and Next.js
Symfony (PHP)
Use open-telemetry/opentelemetry-auto-symfony for automatic instrumentation of HTTP requests, Doctrine queries, and Messenger dispatches.
composer require open-telemetry/sdk open-telemetry/opentelemetry-auto-symfony
Key environment variables for .env.prod:
OTEL_SERVICE_NAME=your-app-name
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
OTEL_TRACES_SAMPLER=parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARG=0.1
Checklist:
- Verify that
ext-opentelemetryis loaded inphp.ini— the auto-instrumentation package requires the C extension - Add custom spans for any business-critical code path not automatically instrumented (e.g., pricing calculations, third-party API calls): use
$tracer->spanBuilder('pricing.calculate')->startSpan() - Confirm traces appear in Tempo within 30 seconds of a test request
Next.js (Node)
npm install @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node
Checklist:
- Create
instrumentation.tsat the project root (Next.js 14+ registers this automatically) - Initialise
NodeSDKwith an OTLP exporter pointing at the Collector - Set
OTEL_SERVICE_NAMEandOTEL_RESOURCE_ATTRIBUTES=deployment.environment=productionin your environment config - Use
registerInstrumentationswithgetNodeAutoInstrumentations()— this covers HTTP, fetch, database clients, and Redis automatically - Verify spans in Tempo. If you see spans with no parent (broken trace tree), check that
traceparentheaders are propagated between services — the W3C Trace Context propagator must be configured on both ends
Phase 3: Grafana Datasource Wiring
A Grafana instance that only shows Prometheus data is half-connected. The full setup requires four datasources and two exemplar links.
Checklist:
- Add Prometheus datasource — mark it as default
- Add Tempo datasource, set
Trace to logsto point at Loki usingtraceIdas the log label filter - Add Loki datasource, set
Derived fieldsto extracttraceIdfrom log lines and link to Tempo - In the Prometheus datasource config, enable
Exemplarsand set the URL template pointing at Tempo — this lets you click a spike on a latency histogram and jump directly to a sample trace - Install the Sentry datasource plugin (
grafana-sentry-datasource) and connect it with a Sentry auth token scoped toproject:read - Verify the
Exploretab for each datasource returns data before building dashboards
The Ten Panels Every Service Needs at Launch
Add these to a single per-service dashboard. They cover the signals that matter before you have the luxury of full-scale observability engineering:
- Request rate —
rate(http_server_requests_total[5m]) - Error rate (%) — errors divided by total requests
- p50 / p95 / p99 latency — from the
http_request_duration_seconds_buckethistogram - Active DB connections
- Queue depth — if using Symfony Messenger or a background job queue
- Memory usage vs limit
- CPU throttling — signals container resource pressure before it becomes a latency problem
- Sentry error volume — issues per minute from the Sentry datasource
- Deployment markers — annotated from CI/CD via the Grafana Annotations API, so you can see exactly when a new release landed on any graph
- Apdex score — a single human-readable number combining latency and error rate
Phase 4: Alertmanager and Sentry Release Tracking
Alertmanager to Slack and PagerDuty
Prometheus evaluates alerting rules; Alertmanager handles routing, deduplication, and silencing.
Checklist:
- Define alerting rules in a
rules/directory mounted into Prometheus, not inline inprometheus.yml— easier to track in git - Start with four rules:
HighErrorRate(>1% for 5 min),HighLatency(p95 >1 s for 5 min),ServiceDown(up == 0 for 1 min),HighMemoryUsage(>90% for 10 min) - Configure Alertmanager
routesto sendseverity: criticalalerts to PagerDuty andseverity: warningto a Slack channel - Set
group_wait: 30s,group_interval: 5m,repeat_interval: 4h— the defaults are too aggressive and generate alert fatigue within a week - Add inhibition rules: suppress
HighErrorRateandHighLatencywhenServiceDownis already firing for the same service — you do not need three simultaneous pages for one outage
Sentry Release Tracking
Sentry's release tracking is underused but essential for distinguishing "this bug existed for weeks" from "this regression appeared in today's deploy."
Checklist:
- Set
SENTRY_RELEASEto the git commit SHA in your CI/CD pipeline (or usesentry-cli releases propose-version) - Call
sentry-cli releases finalize $RELEASEat the end of a successful deploy - Upload source maps from the Next.js build to Sentry so stacktraces show original TypeScript rather than minified output
- Configure
traces_sample_ratein the Sentry SDK to match your OTel sampling rate — if OTel samples 10% and Sentry samples 100%, you will see performance transactions that have no corresponding OTel trace - In Sentry project settings, enable
Connect tracesand paste the Tempo URL pattern to create direct links between a Sentry trace and the corresponding Tempo trace
The Three Integration Mistakes That Cause Phantom Alerts
After auditing observability stacks for SaaS clients at various stages of growth, three configuration errors appear consistently.
1. Mismatched sampling rates across tools. OTel at 10% sampling and Sentry at 100% is the most common culprit. Sentry captures an error, generates a sentry-trace header, and tries to correlate it with an OTel trace that was never sampled. The result: broken trace links and a false impression that your tracing is broken when the tool configuration is simply misaligned.
Fix: set both OTEL_TRACES_SAMPLER_ARG and Sentry's traces_sample_rate to the same value, or use Sentry as a tail-based sampler downstream of OTel.
2. Clock skew between containers. Traces reconstructed from spans collected by different containers require consistent timestamps. A container whose clock has drifted by even 200 ms will produce traces with overlapping or negative-duration spans. These look like instrumentation bugs but are infrastructure issues.
Fix: ensure all containers use the host's NTP-synced clock. In Kubernetes this is handled automatically; in bare Docker Compose it is not, and it catches teams off guard.
3. Label cardinality explosion from trace IDs in Prometheus. If you accidentally include a trace ID, request ID, or user ID as a Prometheus label, you create one unique time series per request. Prometheus will slow to a crawl and eventually run out of memory.
Fix: audit your label set before deploying to production. Any label whose value is unbounded must be dropped at the Collector or at the application metrics layer before Prometheus ingests it. The OTel Collector's transform processor can strip specific attributes from metrics before they reach Prometheus.
The Reference docker-compose Stack
For local development and staging, this minimal compose file wires all four tools together. Adapt resource limits and storage volumes before using in production.
services:
prometheus:
image: prom/prometheus:v2.51.0
volumes:
- ./config/prometheus.yml:/etc/prometheus/prometheus.yml
- ./config/rules:/etc/prometheus/rules
ports: ["9090:9090"]
grafana:
image: grafana/grafana:10.4.2
environment:
- GF_INSTALL_PLUGINS=grafana-sentry-datasource
ports: ["3001:3000"]
depends_on: [prometheus, loki, tempo]
otel-collector:
image: otel/opentelemetry-collector-contrib:0.99.0
volumes:
- ./config/otel-collector.yaml:/etc/otel/config.yaml
command: ["--config", "/etc/otel/config.yaml"]
ports: ["4317:4317", "4318:4318", "8888:8888"]
tempo:
image: grafana/tempo:2.4.1
command: ["-config.file=/etc/tempo.yaml"]
volumes:
- ./config/tempo.yaml:/etc/tempo.yaml
loki:
image: grafana/loki:2.9.7
ports: ["3100:3100"]
Getting This Right the First Time
Building this stack from scratch takes two to three days of careful configuration. Getting it wrong — mismatched sampling, unbounded label cardinality, siloed alerting — means you have observability tooling that does not actually help you debug production incidents faster. The value of a connected stack only becomes visible when you can move from an alert to a root cause in under five minutes.
If you are instrumenting a Symfony or Next.js application and want a second opinion on your current setup — or need this stack built and validated from the start — hello@wolf-tech.io is the right starting point. Observability architecture is also part of our code quality consulting and custom application development engagements at wolf-tech.io.

