Cosmo router supports exporting tracing and metrics via OpenTelemetry. By default, both are exported to Cosmo Cloud, but these can be configured with additional exporters or disable the default ones. Both http and grpc are supported.

telemetry:

  service_name: "cosmo-router" # Service name for traces and metrics

  tracing:

    enabled: true

    # If no exporters are configured, telemetry is exported to Cosmo Cloud

    exporters:

    - endpoint: https://cosmo-otel.wundergraph.com # The default

      exporter: "http" # or "grpc" with port 4317

      headers: {Authorization: Bearer ${ROUTER_TOKEN}}

  metrics:

    otlp:

      enabled: true

      # If no exporters are configured, telemetry is exported to Cosmo Cloud

      exporters:

      - endpoint: https://cosmo-otel.wundergraph.com # The default

        exporter: "http" # or "grpc" with port 4317

        headers: {Authorization: Bearer ${ROUTER_TOKEN}}

If no exporters are configured, the default one is used instead (set by the DEFAULT_TELEMETRY_EXPORTER environment variable)

The router can also expose Prometheus metrics. It works with the same OTEL metrics we export over OTEL. See Metrics & Monitoring

Exclude certain metrics and labels

Excluding certain metrics and labels can significantly reduce the cardinality of the collected telemetry data, allowing for a tailored setup that aligns with your specific monitoring needs and minimizes unnecessary data collection. These exclusion options can be easily configured within the otlp section of the router config.

We support a Go Regex strings. You can test your Regex at https://regex101.com/.

To handle OTLP metrics, which are separated by dots, you need to escape the . character in the regular expression.

telemetry:

  metrics:

    otlp:

      exclude_metrics:

        - "router\.http\.request\.duration_milliseconds" # Exclude the full histogram

      exclude_metric_labels:

        - "wg\.client\.version"

Limits

High metric cardinality can lead to performance issues by consuming excessive resources and slowing down data processing. When too many distinct metric labels are generated, the system might struggle to manage the data efficiently. To mitigate this, we have set a default hard cardinality limit of 2000. This limit helps to ensure that the metrics remain manageable and that the performance of our system is not adversely affected.

Once the limit is reached, all further datapoints to a metric will be stored without attributes.

The default OpenTelemetry Collector ingests requests up to 4 MB by default via gRPC. If you are experiencing high cardinality in your metrics, it’s necessary to adjust the collector’s gRPC limits to accommodate larger requests.

We recommend at least 12 MB

Tracing

Multiple exporters

You can configure multiple exporters. A common case is to forward telemetry data to Cosmo Cloud and e.g. Datadog Agent which has native support to ingest OpenTelemetry data.

telemetry:

  service_name: "cosmo-router" # Service name for traces and metrics


  # OpenTelemetry Tracing

  tracing:

    enabled: true

    # If no exporters are defined, the default one is used

    exporters:

    # Cosmo Cloud

    - endpoint: https://cosmo-otel.wundergraph.com

      exporter: "http"

      headers: {Authorization: Bearer ${MY_AUTH_TOKEN}}

    # Datadog Agent

    - endpoint: http://datadog-agent:4318

      exporter: "http"


  # OpenTelemetry Metrics

  metrics:

    otlp:

      enabled: true

      # If no exporters are defined, the default one is used

      exporters:

        - exporter: http # or grpc

          disabled: false

          endpoint: https://my-otel-collector.example.com

Trace Propagation

Propagation is the mechanism that allows the movement of data between services and processes. As a user of the router, you will interact with them in the form of headers to pass the trace context across service boundaries. By default, we enable the widely-used Trace-Context specification. You can disable or enable different specifications according to your needs.

telemetry:

  tracing:

    propagation:

      # https://www.w3.org/TR/trace-context/

      trace_context: true

      # https://www.w3.org/TR/baggage/

      baggage: false

      # https://www.jaegertracing.io/ (compliant with opentracing)

      jaeger: false

      # https://github.com/openzipkin/b3-propagation (zipkin)

      b3: false

Example: Enable B3 propagation

If you want to enable B3 for example, you can set b3 to true but you also need to add the trace headers to the CORS config to not run into issues.

# Cross-Origin Resource Sharing (CORS)

cors:

  allow_headers:

    - b3

GraphQL variables

GraphQL variables are useful for debugging to replay queries but they can pose a potential risk because they include request data. To mitigate this, you have to explicitly opt in. In the future, we will provide tools to redact specific arguments.

telemetry:

  tracing:

      export_graphql_variables: true # TRACING_EXPORT_GRAPHQL_VARIABLES

This enables the option to replay GraphQL queries with variables in the Studio.

Subscriptions

Tracing has not yet been implemented for subscriptions. If you require this feature, please do not hesitate to contact us.

Trace ID Response header

This configuration allows you to include trace ID in the response headers, where the name of the header would be the value provided.

telemetry:

  tracing:

    response_trace_id:

      enabled: true

      header_name: "my-trace-id" # default: "x-wg-trace-id"

FAQ