Skip to content

Telemetry (OpenTelemetry)

discord-mcp ships with OpenTelemetry traces and metrics built on top of the official OTel JS SDK. This page covers how to enable telemetry, where to send the data, and what signals you get out of the box.

The SDK is disabled by default: if you do not set OTEL_ENABLED=true the server runs with no exporter wired and zero overhead. All telemetry configuration is environment-variable driven (no code changes needed).

Pick the exporter that matches your environment.

Set the OTLP endpoint and your team API key as headers, then start the server:

Terminal window
export OTEL_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io
export OTEL_EXPORTER_OTLP_HEADERS=x-honeycomb-team=YOUR_API_KEY,x-honeycomb-dataset=discord-mcp
export OTEL_SERVICE_NAME=discord-mcp
export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
npx discord-mcp

Traces appear in the Honeycomb UI under the discord-mcp dataset within ~10 seconds (the OTLP exporter batches).

MetricTypeDescription
mcp.tool.duration_msHistogramWall-clock duration of each tool call (milliseconds).
mcp.tool.callsCounterTotal tool calls, labelled by status{ok, tool_error, error}.
mcp.tool.errorsCounterSubset of calls that ended in error (tool_error or thrown).

Common labels on every metric: mcp.tool.name, mcp.tool.category, mcp.tool.idempotent, mcp.transport, status.

MetricTypeDescription
mcp.circuit.transitionsCounterCircuit breaker state transitions (closed → open, open → half-open, half-open → closed). Labelled by from, to, route.
mcp.bulkhead.rejected.countCounterCalls fast-rejected because the bulkhead semaphore was full. Labelled by route.
mcp.deadletter.countCounterCalls that exhausted retries / circuit-rejected and surfaced to the client as a structured error. Labelled by tool, category, error_code.

Every tool invocation produces an mcp.tool.<tool_name> SERVER span with:

  • Standard attributes: mcp.tool.name, mcp.tool.category, mcp.tool.idempotent, mcp.transport, mcp.request_id (if known).
  • Span event mcp.tool.args with attribute mcp.args.redacted containing the JSON-stringified, redacted arg payload (see Audit → Privacy redaction for the policy).
  • Status: OK on success, ERROR with tool returned isError for structured tool errors, or the thrown exception message for crashes.

If OTEL_ENABLED=true AND the underlying Discord REST call fires under the active span context, you also get a child CLIENT span from @opentelemetry/instrumentation-undici with the standard http.request.method, url.full, http.response.status_code attributes plus the discord.route (route-redacted, e.g. POST /channels/:id/messages).

For Grafana + Prometheus, useful starter queries:

  • Tool error rate, last 5m: sum by (mcp.tool.name) (rate(mcp_tool_errors_total[5m])) / sum by (mcp.tool.name) (rate(mcp_tool_calls_total[5m]))
  • p95 tool latency: histogram_quantile(0.95, sum by (mcp.tool.name, le) (rate(mcp_tool_duration_ms_bucket[5m])))
  • Circuit open events: sum by (route) (rate(mcp_circuit_transitions_total{to="open"}[1h]))
  • Bulkhead saturation: sum by (route) (rate(mcp_bulkhead_rejected_count_total[5m]))

For Honeycomb, useful starting BubbleUp / triggers:

  • Slow mcp.tool.<name> spans (P95 > 1s).
  • mcp.tool.errors > 10 per minute.
  • mcp.deadletter.count > 0 (every dead-letter is a real failure the client saw).

The shipped repository does NOT include exported board JSON — every deployment has different SLOs. Use the queries above as a starting point and tune.

The default trace sampler is parentbased_always_on: if an incoming trace context is set (parent), defer to it; otherwise sample 100%. This is fine for low-volume bot deployments. For high-volume servers, switch to:

Terminal window
export OTEL_TRACES_SAMPLER=parentbased_traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.05 # 5% sampling

Metrics are NOT affected by trace sampling — counters and histograms are always reported.

VarDefaultDescription
OTEL_ENABLEDfalseMaster switch. When false, SDK is not booted.
OTEL_SERVICE_NAMEdiscord-mcpservice.name resource attribute.
OTEL_SERVICE_VERSION(package version)service.version resource attribute.
OTEL_EXPORTER_OTLP_ENDPOINT(unset)OTLP collector endpoint (e.g. http://localhost:4318).
OTEL_EXPORTER_OTLP_PROTOCOLhttp/protobufOne of http/protobuf, http/json, grpc.
OTEL_EXPORTER_OTLP_HEADERS(unset)Comma-separated key=value pairs (e.g. for vendor auth).
OTEL_TRACES_SAMPLERparentbased_always_onOne of the standard OTel samplers.
OTEL_TRACES_SAMPLER_ARG1Ratio for ratio-based samplers.
OTEL_CONSOLE_EXPORTERfalseWhen true, also writes spans to stderr as JSON (debug aid).
  • Resilience — circuit/bulkhead/retry knobs that emit the resilience metrics above.
  • Audit — mutating-call trail; uses the same redaction policy as span events.
  • Architecture → Middleware chain — where the telemetry middleware sits in the call path.