Prometheus + Grafana + OpenTelemetry + Sentry: The Integration Checklist for a Production-Ready Observability Stack

#prometheus grafana opentelemetry sentry integration checklist
Sandor Farkas - Founder & Lead Developer at Wolf-Tech

Sandor Farkas

Founder & Lead Developer

Expert in software development and legacy code optimization

Most observability problems are not tool problems. They are integration problems. You have Prometheus scraping metrics, OpenTelemetry emitting traces, Loki collecting logs, and Sentry catching exceptions — but each system lives in its own silo. An alert fires at 2 AM, and you spend twenty minutes copy-pasting trace IDs between four browser tabs trying to build a picture that your tooling should have assembled for you.

This post gives you the prometheus grafana opentelemetry sentry integration checklist that actually connects those four systems. It covers collector configuration, SDK instrumentation for Symfony and Next.js, Grafana datasource wiring, alert routing from Alertmanager to Slack and PagerDuty, and the three integration mistakes that generate phantom alerts or — worse — silent failures.


The Four-Layer Stack and How the Pieces Fit

Before diving into the checklist, a quick map of responsibilities:

  • Prometheus collects and stores time-series metrics. It scrapes /metrics endpoints on a schedule and evaluates alerting rules.
  • Grafana visualises everything. It acts as a unified frontend for Prometheus, Loki, Tempo, and — via the Sentry datasource plugin — your error volumes.
  • OpenTelemetry (OTel) is the instrumentation layer. The OTel SDK in your application emits traces, metrics, and logs. The OTel Collector receives them and routes them to the right backends (Tempo for traces, Loki for logs, Prometheus remote-write for metrics).
  • Sentry catches errors, groups them into issues, and tracks releases. Its strongest suit is stacktraces with source-context and release-level regression detection.

The integration goal is correlation: when a Grafana alert fires on a Prometheus metric, you should be able to click through to a Tempo trace from the same time window, see the related Loki log lines, and land on the Sentry issue that contains the stacktrace. Each section below moves you one step closer to that.


Phase 1: Prometheus and the OTel Collector

Collector Configuration

The OTel Collector is the central routing hub. Run it as a sidecar or a dedicated service; the latter scales better.

Checklist:

  • Deploy otelcol-contrib (not the core build — you need the Prometheus exporter and Loki exporter receivers)
  • Configure a prometheus receiver on port 9464 to scrape your application's /metrics endpoint
  • Configure an otlp receiver (gRPC on 4317, HTTP on 4318) for traces and logs from the SDK
  • Add a batch processor with send_batch_size: 1024 and timeout: 5s to avoid hammering backends with tiny payloads
  • Add a memory_limiter processor set to 80% of the container's memory limit — the Collector will OOM in production without it
  • Route trace data to an otlp exporter pointing at Tempo
  • Route log data to a loki exporter
  • Route metrics to a prometheusremotewrite exporter pointing at your Prometheus instance (or Mimir / Thanos if you're scaling horizontally)

The most common mistake here: using prometheusremotewrite and also leaving a separate Prometheus scrape job for the same service. You will see duplicate time series and erratic rate calculations. Pick one ingestion path.

Prometheus Scrape Configuration

A minimal scrape config that avoids label collisions:

scrape_configs:
  - job_name: 'otel-collector'
    static_configs:
      - targets: ['otel-collector:8888']
  - job_name: 'your-app'
    static_configs:
      - targets: ['app:9464']
    honor_labels: true
    scrape_interval: 15s

Confirm the Prometheus /targets page shows all jobs as UP before moving to alerting rules.


Phase 2: SDK Instrumentation in Symfony and Next.js

Symfony (PHP)

Use open-telemetry/opentelemetry-auto-symfony for automatic instrumentation of HTTP requests, Doctrine queries, and Messenger dispatches.

composer require open-telemetry/sdk open-telemetry/opentelemetry-auto-symfony

Key environment variables for .env.prod:

OTEL_SERVICE_NAME=your-app-name
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
OTEL_TRACES_SAMPLER=parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARG=0.1

Checklist:

  • Verify that ext-opentelemetry is loaded in php.ini — the auto-instrumentation package requires the C extension
  • Add custom spans for any business-critical code path not automatically instrumented (e.g., pricing calculations, third-party API calls): use $tracer->spanBuilder('pricing.calculate')->startSpan()
  • Confirm traces appear in Tempo within 30 seconds of a test request

Next.js (Node)

npm install @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node

Checklist:

  • Create instrumentation.ts at the project root (Next.js 14+ registers this automatically)
  • Initialise NodeSDK with an OTLP exporter pointing at the Collector
  • Set OTEL_SERVICE_NAME and OTEL_RESOURCE_ATTRIBUTES=deployment.environment=production in your environment config
  • Use registerInstrumentations with getNodeAutoInstrumentations() — this covers HTTP, fetch, database clients, and Redis automatically
  • Verify spans in Tempo. If you see spans with no parent (broken trace tree), check that traceparent headers are propagated between services — the W3C Trace Context propagator must be configured on both ends

Phase 3: Grafana Datasource Wiring

A Grafana instance that only shows Prometheus data is half-connected. The full setup requires four datasources and two exemplar links.

Checklist:

  • Add Prometheus datasource — mark it as default
  • Add Tempo datasource, set Trace to logs to point at Loki using traceId as the log label filter
  • Add Loki datasource, set Derived fields to extract traceId from log lines and link to Tempo
  • In the Prometheus datasource config, enable Exemplars and set the URL template pointing at Tempo — this lets you click a spike on a latency histogram and jump directly to a sample trace
  • Install the Sentry datasource plugin (grafana-sentry-datasource) and connect it with a Sentry auth token scoped to project:read
  • Verify the Explore tab for each datasource returns data before building dashboards

The Ten Panels Every Service Needs at Launch

Add these to a single per-service dashboard. They cover the signals that matter before you have the luxury of full-scale observability engineering:

  1. Request raterate(http_server_requests_total[5m])
  2. Error rate (%) — errors divided by total requests
  3. p50 / p95 / p99 latency — from the http_request_duration_seconds_bucket histogram
  4. Active DB connections
  5. Queue depth — if using Symfony Messenger or a background job queue
  6. Memory usage vs limit
  7. CPU throttling — signals container resource pressure before it becomes a latency problem
  8. Sentry error volume — issues per minute from the Sentry datasource
  9. Deployment markers — annotated from CI/CD via the Grafana Annotations API, so you can see exactly when a new release landed on any graph
  10. Apdex score — a single human-readable number combining latency and error rate

Phase 4: Alertmanager and Sentry Release Tracking

Alertmanager to Slack and PagerDuty

Prometheus evaluates alerting rules; Alertmanager handles routing, deduplication, and silencing.

Checklist:

  • Define alerting rules in a rules/ directory mounted into Prometheus, not inline in prometheus.yml — easier to track in git
  • Start with four rules: HighErrorRate (>1% for 5 min), HighLatency (p95 >1 s for 5 min), ServiceDown (up == 0 for 1 min), HighMemoryUsage (>90% for 10 min)
  • Configure Alertmanager routes to send severity: critical alerts to PagerDuty and severity: warning to a Slack channel
  • Set group_wait: 30s, group_interval: 5m, repeat_interval: 4h — the defaults are too aggressive and generate alert fatigue within a week
  • Add inhibition rules: suppress HighErrorRate and HighLatency when ServiceDown is already firing for the same service — you do not need three simultaneous pages for one outage

Sentry Release Tracking

Sentry's release tracking is underused but essential for distinguishing "this bug existed for weeks" from "this regression appeared in today's deploy."

Checklist:

  • Set SENTRY_RELEASE to the git commit SHA in your CI/CD pipeline (or use sentry-cli releases propose-version)
  • Call sentry-cli releases finalize $RELEASE at the end of a successful deploy
  • Upload source maps from the Next.js build to Sentry so stacktraces show original TypeScript rather than minified output
  • Configure traces_sample_rate in the Sentry SDK to match your OTel sampling rate — if OTel samples 10% and Sentry samples 100%, you will see performance transactions that have no corresponding OTel trace
  • In Sentry project settings, enable Connect traces and paste the Tempo URL pattern to create direct links between a Sentry trace and the corresponding Tempo trace

The Three Integration Mistakes That Cause Phantom Alerts

After auditing observability stacks for SaaS clients at various stages of growth, three configuration errors appear consistently.

1. Mismatched sampling rates across tools. OTel at 10% sampling and Sentry at 100% is the most common culprit. Sentry captures an error, generates a sentry-trace header, and tries to correlate it with an OTel trace that was never sampled. The result: broken trace links and a false impression that your tracing is broken when the tool configuration is simply misaligned.

Fix: set both OTEL_TRACES_SAMPLER_ARG and Sentry's traces_sample_rate to the same value, or use Sentry as a tail-based sampler downstream of OTel.

2. Clock skew between containers. Traces reconstructed from spans collected by different containers require consistent timestamps. A container whose clock has drifted by even 200 ms will produce traces with overlapping or negative-duration spans. These look like instrumentation bugs but are infrastructure issues.

Fix: ensure all containers use the host's NTP-synced clock. In Kubernetes this is handled automatically; in bare Docker Compose it is not, and it catches teams off guard.

3. Label cardinality explosion from trace IDs in Prometheus. If you accidentally include a trace ID, request ID, or user ID as a Prometheus label, you create one unique time series per request. Prometheus will slow to a crawl and eventually run out of memory.

Fix: audit your label set before deploying to production. Any label whose value is unbounded must be dropped at the Collector or at the application metrics layer before Prometheus ingests it. The OTel Collector's transform processor can strip specific attributes from metrics before they reach Prometheus.


The Reference docker-compose Stack

For local development and staging, this minimal compose file wires all four tools together. Adapt resource limits and storage volumes before using in production.

services:
  prometheus:
    image: prom/prometheus:v2.51.0
    volumes:
      - ./config/prometheus.yml:/etc/prometheus/prometheus.yml
      - ./config/rules:/etc/prometheus/rules
    ports: ["9090:9090"]

  grafana:
    image: grafana/grafana:10.4.2
    environment:
      - GF_INSTALL_PLUGINS=grafana-sentry-datasource
    ports: ["3001:3000"]
    depends_on: [prometheus, loki, tempo]

  otel-collector:
    image: otel/opentelemetry-collector-contrib:0.99.0
    volumes:
      - ./config/otel-collector.yaml:/etc/otel/config.yaml
    command: ["--config", "/etc/otel/config.yaml"]
    ports: ["4317:4317", "4318:4318", "8888:8888"]

  tempo:
    image: grafana/tempo:2.4.1
    command: ["-config.file=/etc/tempo.yaml"]
    volumes:
      - ./config/tempo.yaml:/etc/tempo.yaml

  loki:
    image: grafana/loki:2.9.7
    ports: ["3100:3100"]

Getting This Right the First Time

Building this stack from scratch takes two to three days of careful configuration. Getting it wrong — mismatched sampling, unbounded label cardinality, siloed alerting — means you have observability tooling that does not actually help you debug production incidents faster. The value of a connected stack only becomes visible when you can move from an alert to a root cause in under five minutes.

If you are instrumenting a Symfony or Next.js application and want a second opinion on your current setup — or need this stack built and validated from the start — hello@wolf-tech.io is the right starting point. Observability architecture is also part of our code quality consulting and custom application development engagements at wolf-tech.io.