Open Telemetry
Use OpenTelemetry with traces and metrics
Cosmo router supports exporting tracing and metrics via OpenTelemetry. By default, both are exported to Cosmo Cloud, but these can be configured with additional exporters or disable the default ones. Both http
and grpc
are supported.
If no exporters are configured, the default one is used instead (set by the DEFAULT_TELEMETRY_EXPORTER
environment variable)
The router can also expose Prometheus metrics. It works with the same OTEL metrics we export over OTEL. See Metrics & Monitoring
Exclude certain metrics and labels
Excluding certain metrics and labels can significantly reduce the cardinality of the collected telemetry data, allowing for a tailored setup that aligns with your specific monitoring needs and minimizes unnecessary data collection. These exclusion options can be easily configured within the otlp
section of the router config.
We support a Go Regex strings. You can test your Regex at https://regex101.com/.
To handle OTLP metrics, which are separated by dots, you need to escape the .
character in the regular expression.
Limits
High metric cardinality can lead to performance issues by consuming excessive resources and slowing down data processing. When too many distinct metric labels are generated, the system might struggle to manage the data efficiently. To mitigate this, we have set a default hard cardinality limit of 2000. This limit helps to ensure that the metrics remain manageable and that the performance of our system is not adversely affected.
Once the limit is reached, all further datapoints to a metric will be stored without attributes.
The default OpenTelemetry Collector ingests requests up to 4 MB by default via gRPC. If you are experiencing high cardinality in your metrics, it’s necessary to adjust the collector’s gRPC limits to accommodate larger requests.
We recommend at least 12 MB
Tracing
Multiple exporters
You can configure multiple exporters. A common case is to forward telemetry data to Cosmo Cloud and e.g. Datadog Agent which has native support to ingest OpenTelemetry data.
Trace Propagation
Propagation is the mechanism that allows the movement of data between services and processes. As a user of the router, you will interact with them in the form of headers to pass the trace context across service boundaries. By default, we enable the widely-used Trace-Context specification. You can disable or enable different specifications according to your needs.
Example: Enable B3 propagation
If you want to enable B3 for example, you can set b3
to true
but you also need to add the trace headers to the CORS config to not run into issues.
GraphQL variables
GraphQL variables are useful for debugging to replay queries but they can pose a potential risk because they include request data. To mitigate this, you have to explicitly opt in. In the future, we will provide tools to redact specific arguments.
This enables the option to replay GraphQL queries with variables in the Studio.
Subscriptions
Tracing has not yet been implemented for subscriptions. If you require this feature, please do not hesitate to contact us.
Trace ID Response header
This configuration allows you to include trace ID in the response headers, where the name of the header would be the value provided.