Open Telemetry

Use OpenTelemetry with traces and metrics

Cosmo router supports exporting tracing and metrics via OpenTelemetry. By default, both are exported to Cosmo Cloud, but these can be configured with additional exporters or disable the default ones. Both http and grpc are supported.

config.yaml
telemetry:
  service_name: "cosmo-router" # Service name for traces and metrics
  tracing:
    enabled: true
    # If no exporters are configured, telemetry is exported to Cosmo Cloud
    exporters:
    - endpoint: https://cosmo-otel.wundergraph.com # The default
      exporter: "http" # or "grpc" with port 4317
      headers: {Authorization: Bearer ${ROUTER_TOKEN}}
  metrics:
    otlp:
      enabled: true
      # If no exporters are configured, telemetry is exported to Cosmo Cloud
      exporters:
      - endpoint: https://cosmo-otel.wundergraph.com # The default
        exporter: "http" # or "grpc" with port 4317
        headers: {Authorization: Bearer ${ROUTER_TOKEN}}

If no exporters are configured, the default one is used instead (set by the DEFAULT_TELEMETRY_EXPORTER environment variable)

The router can also expose Prometheus metrics. It works with the same OTEL metrics we export over OTEL. See Metrics & Monitoring

Tracing

Multiple exporters

You can configure multiple exporters. A common case is to forward telemetry data to Cosmo Cloud and e.g. Datadog Agent which has native support to ingest OpenTelemetry data.

config.yaml
telemetry:
  service_name: "cosmo-router" # Service name for traces and metrics
  
  # OpenTelemetry Tracing
  tracing:
    enabled: true
    # If no exporters are defined, the default one is used
    exporters:
    # Cosmo Cloud
    - endpoint: https://cosmo-otel.wundergraph.com
      exporter: "http"
      headers: {Authorization: Bearer ${MY_AUTH_TOKEN}}
    # Datadog Agent 
    - endpoint: http://datadog-agent:4318
      exporter: "http"
      
  # OpenTelemetry Metrics
  metrics:
    otlp:
      enabled: true
      # If no exporters are defined, the default one is used
      exporters:
        - exporter: http # or grpc
          disabled: false
          endpoint: https://my-otel-collector.example.com

Trace Propagation

Propagation is the mechanism that allows the movement of data between services and processes. As a user of the router, you will interact with them in the form of headers to pass the trace context across service boundaries. By default, we enable the widely-used Trace-Context specification. You can disable or enable different specifications according to your needs.

config.yaml
telemetry:
  tracing:
    propagation:
      # https://www.w3.org/TR/trace-context/
      trace_context: true
      # https://www.w3.org/TR/baggage/
      baggage: false
      # https://www.jaegertracing.io/ (compliant with opentracing)
      jaeger: false
      # https://github.com/openzipkin/b3-propagation (zipkin)
      b3: false   

Example: Enable B3 propagation

If you want to enable B3 for example, you can set b3 to true but you also need to add the trace headers to the CORS config to not run into issues.

config.yaml
# Cross-Origin Resource Sharing (CORS)
cors:
  allow_headers:
    - b3

GraphQL variables

GraphQL variables are useful for debugging to replay queries but they can pose a potential risk because they include request data. To mitigate this, you have to explicitly opt in. In the future, we will provide tools to redact specific arguments.

telemetry:
  tracing:
      export_graphql_variables: true # TRACING_EXPORT_GRAPHQL_VARIABLES

This enables the option to replay GraphQL queries with variables in the Studio.

Subscriptions

Tracing has not yet been implemented for subscriptions. If you require this feature, please do not hesitate to contact us.

Trace ID Response header

This configuration allows you to include trace ID in the response headers, where the name of the header would be the value provided.

telemetry:
  tracing:
    response_trace_id:
      enabled: true
      header_name: "my-trace-id" # default: "x-wg-trace-id"

FAQ

Can I export OTEL data from my application?

Yes, but this is currently limited to private backend applications e.g. subgraphs because we don't provide a secure mechanism to issue short-lived tokens in public applications like Web-Apps. Never expose your GRAPH_TOKEN to the public. A possible solution to make this work is to run an otelcollector and utilize the export functionality to forward telemetry data to Cosmo Cloud. This will enable you to configure your own authentication layer and observe malicious behaviors. It is also a recommended approach to ingest data to multiple OTEL platforms.

Why is my trace incomplete?

In certain conditions, it can happen that spans are not listed in the Studio. We have observed these cases.

  1. Platforms like Google Cloud Run automatically propagate trace headers according to the Trace-Context specification in every service call.

  2. Your client request has already set a trace context, but it did not send the span to Cosmo Cloud.

  3. External, uncontrolled clients sent trace headers.

In all cases, the issue is that not all spans will be sent to the Cosmo Platform. You have to ensure that all spans are sent to us so we can reconstruct the full trace.

Last updated