opentelemetry-expert

Expert OpenTelemetry coverage including SDK architecture, semantic conventions, W3C context propagation, auto vs manual instrumentation, OTel Collector pipeline configuration, sampling strategies (head/tail/parent-based), OTLP protocol, resource attributes, and backend integration

MoltbotDen

DevOps & Cloud

OpenTelemetry Expert

OpenTelemetry is the CNCF standard for generating, collecting, and exporting telemetry data. It solves
the vendor lock-in problem that plagued observability for a decade: instrument once with the OTel SDK,
then route to any backend (Datadog, New Relic, Jaeger, Tempo, Zipkin) by changing collector config.

Core Mental Model

OTel has three planes: instrumentation (SDK in your app), collection (OTel Collector as agent/gateway),
and backends (where data lives and is queried). The SDK emits signals (traces, metrics, logs) using the
OTLP protocol. The Collector receives, processes, and fans out to backends. This separation means you
instrument once and change backends without touching application code. Semantic conventions are the contract
between your instrumentation and your dashboards — violate them and your out-of-the-box dashboards break.

OTel Architecture

┌─────────────────────────────────────────────────────────┐
│                    Your Application                      │
│                                                          │
│  OTel SDK                                                │
│  ├── TracerProvider → creates Tracers → creates Spans    │
│  ├── MeterProvider  → creates Meters  → creates Instruments│
│  └── LoggerProvider → creates Loggers → creates LogRecords│
│                │                                         │
│                │ OTLP (gRPC port 4317 / HTTP port 4318) │
└────────────────┼─────────────────────────────────────────┘
                 │
         ┌───────▼──────────┐
         │  OTel Collector   │  (agent on host OR gateway cluster)
         │                   │
         │  Receivers        │  OTLP, Jaeger, Zipkin, Prometheus
         │  Processors       │  batch, filter, transform, sample
         │  Exporters        │  Jaeger, Tempo, Prometheus, OTLP
         └─────────┬─────────┘
                   │
        ┌──────────┼──────────┐
        ▼          ▼          ▼
     Jaeger      Tempo    Prometheus
   (traces)    (traces)   (metrics)

Semantic Conventions

Semantic conventions define the standard attribute names for common operations. Using them unlocks
automatic dashboards and correlations in observability backends.

HTTP Spans

# ✅ Standard HTTP server span attributes
span.set_attribute("http.method", "GET")              # HTTP method
span.set_attribute("http.target", "/api/orders/123")  # Request target
span.set_attribute("http.host", "api.example.com")    # Host header
span.set_attribute("http.scheme", "https")            # URL scheme
span.set_attribute("http.status_code", 200)           # Response status
span.set_attribute("http.flavor", "1.1")              # HTTP version
span.set_attribute("net.peer.ip", "10.0.1.5")        # Client IP

# DB spans
span.set_attribute("db.system", "postgresql")
span.set_attribute("db.name", "orders")
span.set_attribute("db.operation", "SELECT")
span.set_attribute("db.statement", "SELECT * FROM orders WHERE id = $1")
span.set_attribute("net.peer.name", "db.internal")
span.set_attribute("net.peer.port", 5432)

# Messaging spans (Kafka, RabbitMQ, Pub/Sub)
span.set_attribute("messaging.system", "kafka")
span.set_attribute("messaging.destination", "orders-topic")
span.set_attribute("messaging.operation", "receive")
span.set_attribute("messaging.message_id", "msg-abc123")

# Custom app attributes use your namespace
span.set_attribute("app.order.id", order_id)
span.set_attribute("app.user.tier", "premium")

Span Naming Convention

# Service + operation pattern:
http:  "GET /api/orders/{id}"          (template, not actual ID)
db:    "SELECT orders"
rpc:   "OrderService/CreateOrder"
kafka: "orders process"
queue: "payment_queue receive"

# ❌ Wrong (too specific / high-cardinality):
"GET /api/orders/12345"               # Contains actual order ID
"process order 12345"                 # Cardinality explosion

Context Propagation: W3C TraceContext

W3C TraceContext HTTP headers:
  traceparent: 00-{trace-id}-{parent-span-id}-{flags}
  tracestate:  vendor-specific state

Example:
  traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
               ^^                                                        ^^ flags (01=sampled)
               version                                              parent span ID

Baggage (cross-boundary user data):
  baggage: userId=123, region=us-east-1, tenantId=acme

from opentelemetry import trace, baggage
from opentelemetry.propagate import inject, extract
from opentelemetry.baggage.propagation import W3CBaggagePropagator

# Inject context into outgoing HTTP request (FastAPI → downstream service)
headers = {}
inject(headers)  # Adds traceparent and baggage headers automatically

async def call_payment_service(order_id: str):
    headers = {"Content-Type": "application/json"}
    inject(headers)  # ← This propagates the current trace context
    async with httpx.AsyncClient() as client:
        return await client.post(
            "http://payment-service/charge",
            headers=headers,
            json={"order_id": order_id}
        )

# Extract context from incoming request (in middleware)
def get_context_from_request(request):
    return extract(dict(request.headers))

# Set and read baggage
ctx = baggage.set_baggage("user.tier", "premium")
with trace.use_context(ctx):
    # All spans created here will have access to baggage
    user_tier = baggage.get_baggage("user.tier")

Instrumentation: Auto vs Manual

JavaScript/Express Auto-Instrumentation

// instrumentation.js — load before your app
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');
const { OTLPMetricExporter } = require('@opentelemetry/exporter-metrics-otlp-grpc');
const { PeriodicExportingMetricReader } = require('@opentelemetry/sdk-metrics');
const { Resource } = require('@opentelemetry/resources');
const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');

const sdk = new NodeSDK({
  resource: new Resource({
    [SemanticResourceAttributes.SERVICE_NAME]: 'order-api',
    [SemanticResourceAttributes.SERVICE_VERSION]: process.env.APP_VERSION,
    [SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: process.env.NODE_ENV,
  }),
  traceExporter: new OTLPTraceExporter({
    url: 'http://otel-collector:4317',
  }),
  metricReader: new PeriodicExportingMetricReader({
    exporter: new OTLPMetricExporter({ url: 'http://otel-collector:4317' }),
    exportIntervalMillis: 10000,
  }),
  instrumentations: [
    getNodeAutoInstrumentations({
      '@opentelemetry/instrumentation-http': { enabled: true },
      '@opentelemetry/instrumentation-express': { enabled: true },
      '@opentelemetry/instrumentation-pg': { enabled: true },  // PostgreSQL
      '@opentelemetry/instrumentation-redis': { enabled: true },
    }),
  ],
});

sdk.start();

// Run with: node -r ./instrumentation.js app.js

Python Manual Spans with Rich Attributes

from opentelemetry import trace
from opentelemetry.trace import SpanKind, Status, StatusCode

tracer = trace.get_tracer("order-service", "1.0.0")

async def fulfill_order(order_id: str) -> FulfillmentResult:
    """Full span lifecycle with proper error handling."""
    
    with tracer.start_as_current_span(
        "order.fulfill",
        kind=SpanKind.INTERNAL,
        attributes={
            "app.order.id": order_id,
            "app.service.component": "fulfillment",
        }
    ) as span:
        
        # Add events (timestamped annotations within span)
        span.add_event("fulfillment.started", {"queue_depth": get_queue_depth()})
        
        # Child span for DB operation
        with tracer.start_as_current_span("db.get_order") as db_span:
            db_span.set_attribute("db.system", "postgresql")
            db_span.set_attribute("db.operation", "SELECT")
            order = await db.fetch_order(order_id)
        
        if not order:
            span.set_attribute("app.order.found", False)
            span.set_status(Status(StatusCode.ERROR, "Order not found"))
            raise OrderNotFoundError(order_id)
        
        span.set_attribute("app.order.found", True)
        span.set_attribute("app.order.items_count", len(order.items))
        
        # External service call
        with tracer.start_as_current_span(
            "warehouse.reserve_items",
            kind=SpanKind.CLIENT
        ) as ext_span:
            ext_span.set_attribute("rpc.system", "grpc")
            ext_span.set_attribute("rpc.service", "WarehouseService")
            ext_span.set_attribute("rpc.method", "ReserveItems")
            reservation = await warehouse_client.reserve(order.items)
        
        span.add_event("items.reserved", {
            "reservation_id": reservation.id,
            "items_count": len(order.items)
        })
        
        return FulfillmentResult(order=order, reservation=reservation)

OTel Collector: Full Production Config

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  
  # Prometheus metrics scraping
  prometheus:
    config:
      global:
        scrape_interval: 15s
      scrape_configs:
        - job_name: 'kubernetes-pods'
          kubernetes_sd_configs:
            - role: pod
          relabel_configs:
            - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
              action: keep
              regex: "true"
            - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
              action: replace
              target_label: __address__
              regex: (.+)
              replacement: '$1'
  
  # Host metrics (CPU, memory, disk, network)
  hostmetrics:
    collection_interval: 30s
    scrapers:
      cpu: {}
      disk: {}
      filesystem: {}
      memory: {}
      network: {}

processors:
  # Resource detection: auto-populate cloud metadata
  resourcedetection:
    detectors: [env, gcp, aws, azure]
    timeout: 5s
  
  # Add k8s metadata to all telemetry
  k8sattributes:
    auth_type: "serviceAccount"
    passthrough: false
    extract:
      metadata:
        - k8s.pod.name
        - k8s.pod.uid
        - k8s.deployment.name
        - k8s.namespace.name
        - k8s.node.name
  
  # Tail-based sampling
  tail_sampling:
    decision_wait: 10s
    num_traces: 100000
    policies:
      - name: keep-errors
        type: status_code
        status_code: {status_codes: [ERROR]}
      - name: keep-slow
        type: latency
        latency: {threshold_ms: 500}
      - name: keep-10pct
        type: probabilistic
        probabilistic: {sampling_percentage: 10}
  
  # Filter out health check noise
  filter/health:
    traces:
      span:
        - 'attributes["http.target"] == "/health"'
        - 'attributes["http.target"] == "/metrics"'
  
  batch:
    timeout: 5s
    send_batch_size: 512
  
  memory_limiter:
    limit_mib: 1024
    spike_limit_mib: 256
    check_interval: 5s

exporters:
  otlp/jaeger:
    endpoint: jaeger-collector:4317
    tls:
      insecure: true
  
  otlp/tempo:
    endpoint: tempo:4317
    tls:
      insecure: true
  
  prometheusremotewrite:
    endpoint: http://prometheus:9090/api/v1/write
    tls:
      insecure_skip_verify: false
  
  loki:
    endpoint: http://loki:3100/loki/api/v1/push
    labels:
      resource:
        service.name: "service_name"
        k8s.namespace.name: "namespace"

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, resourcedetection, k8sattributes, filter/health, tail_sampling, batch]
      exporters: [otlp/tempo]
    
    metrics:
      receivers: [otlp, prometheus, hostmetrics]
      processors: [memory_limiter, resourcedetection, k8sattributes, batch]
      exporters: [prometheusremotewrite]
    
    logs:
      receivers: [otlp]
      processors: [memory_limiter, resourcedetection, k8sattributes, batch]
      exporters: [loki]
  
  telemetry:
    logs:
      level: warn
    metrics:
      address: ":8888"  # Collector's own metrics

Sampling Strategies

Head-based sampling (decision at trace start):
  ✅ Low overhead — decision made immediately
  ❌ Can't keep traces based on outcome (error discovered later)
  Use: High-volume services where you can afford to drop non-errors
  
  Types:
    Always-on:       sample every trace (dev/low-volume)
    TraceID-ratio:   sample X% deterministically by trace ID
    Parent-based:    inherit parent's sampling decision (distributed default)

Tail-based sampling (decision after trace complete):
  ✅ Can keep 100% of errors, slow traces, etc.
  ❌ Requires buffering all spans in collector (memory cost)
  ❌ Complex multi-collector setup (all spans for a trace must reach same collector)
  Use: When error traces must be kept; when you have budget for collector memory

Recommended production setup:
  SDK: ParentBased(root=TraceIdRatioBased(0.5))  ← 50% at SDK level
  Collector: tail_sampling to further keep errors  ← 100% of errors within the 50%

from opentelemetry.sdk.trace.sampling import (
    ParentBased, TraceIdRatioBased, ALWAYS_ON, ALWAYS_OFF
)

# Production: 50% sampling, always keep if parent sampled
sampler = ParentBased(
    root=TraceIdRatioBased(0.5),
    remote_parent_sampled=ALWAYS_ON,    # Always sample if parent did
    remote_parent_not_sampled=ALWAYS_OFF,
)

tracer_provider = TracerProvider(sampler=sampler, resource=resource)

Exemplars: Linking Metrics to Traces

Exemplars are specific trace IDs embedded in metric data points, allowing you to jump from a latency
spike on a Grafana panel directly to the offending trace in Tempo/Jaeger.

from opentelemetry.metrics import MeterProvider
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import ConsoleMetricExporter

# Exemplars are enabled by default when there's an active span
# The SDK automatically attaches trace_id + span_id to histogram data points

meter = metrics.get_meter("order-service")
latency_histogram = meter.create_histogram(
    name="order.processing.duration",
    description="Time to process an order",
    unit="ms",
)

async def process_order(order_id: str):
    start = time.monotonic()
    with tracer.start_as_current_span("order.process") as span:
        result = await _do_process(order_id)
        duration_ms = (time.monotonic() - start) * 1000
        # Exemplar is auto-attached from current span context
        latency_histogram.record(duration_ms, attributes={"order.type": result.type})
    return result

Resource Attributes

from opentelemetry.sdk.resources import Resource, OTELResourceDetector
from opentelemetry.semconv.resource import ResourceAttributes as RA

resource = Resource.create({
    # Required: service identification
    RA.SERVICE_NAME: "order-api",
    RA.SERVICE_VERSION: os.environ.get("APP_VERSION", "unknown"),
    RA.SERVICE_NAMESPACE: "ecommerce",
    RA.SERVICE_INSTANCE_ID: socket.gethostname(),  # Pod name in k8s
    
    # Deployment context
    RA.DEPLOYMENT_ENVIRONMENT: os.environ.get("ENVIRONMENT", "development"),
    
    # Cloud provider (auto-detected by resourcedetection processor, but can hardcode)
    RA.CLOUD_PROVIDER: "gcp",
    RA.CLOUD_REGION: "us-central1",
    
    # Container info (auto-detected in k8s)
    RA.CONTAINER_NAME: os.environ.get("HOSTNAME", "unknown"),
})

Anti-Patterns

❌ Putting user data in span names — span names become high-cardinality metric dimensions
❌ Setting span status to ERROR for business errors — only set ERROR for unexpected failures (exceptions)
❌ Using ALWAYS_ON sampling in high-volume production — will crush your tracing backend
❌ Missing service.name resource attribute — backends won't know which service the data came from
❌ OTel Collector without memory_limiter processor — collector will OOM under load
❌ Ignoring semantic conventions — db.system=postgres instead of db.system=postgresql breaks dashboards
❌ Recording secrets/PII in span attributes — spans are often stored unencrypted in backends
❌ tail_sampling on collector without enough memory — all unsampled spans are buffered in RAM

Quick Reference

Signal types:
  Traces   → SpanKind: SERVER, CLIENT, INTERNAL, PRODUCER, CONSUMER
  Metrics  → Counter, UpDownCounter, Histogram, ObservableGauge
  Logs     → SeverityText, SeverityNumber, Body, Attributes

OTel Collector ports:
  4317  → OTLP gRPC
  4318  → OTLP HTTP
  8888  → Collector metrics (Prometheus)
  55679 → zpages (debug UI)
  13133 → health_check extension

OTLP environment variables (SDK auto-reads):
  OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
  OTEL_SERVICE_NAME=my-service
  OTEL_TRACES_SAMPLER=parentbased_traceidratio
  OTEL_TRACES_SAMPLER_ARG=0.1
  OTEL_RESOURCE_ATTRIBUTES=deployment.environment=production

Collector debug tips:
  # Enable zpages extension to see trace/span summaries
  extensions:
    zpages:
      endpoint: 0.0.0.0:55679
  # Visit: http://collector:55679/debug/tracez

Skill Information

Source: MoltbotDen
Category: DevOps & Cloud
Repository: View on GitHub