Collect self-hosted telemetry data

Learn how to collect telemetry data from your self-hosted deployment instance.

This feature is in development. It is currently available for self-hosted instances deployed using retool-helm 6.2.0 or above, and running the most recent Edge or Stable release.

Organizations with Self-hosted deployment instances can collect telemetry data using either:

A Retool-provided observability agent.
A self-managed observability agent.

Retool's observability agent can forward data to both Retool and custom destinations. Retool also supports using your own observability agent if you prefer to have complete control over telemetry data collection.

Telemetry data collection is not enabled by default. You must configure your deployment instance to start collecting and forwarding telemetry data.

Configure Retool telemetry collector

When telemetry data is forwarded to Retool, your deployment's health is continually monitored. This allows Retool to have more insight into potential issues and improves the level of support when diagnosing issues.

Use the Helm CLI or update your Helm configuration file to enable telemetry collection. This will send data to Retool by default. Set sendToRetool to false to disable this if you do not want to send data to Retool.

Helm CLI
Configuration file

helm upgrade --set telemetry.enabled=true telemetry.sendToRetool.enabled=false...

...
telemetry:
  enabled: true
  sendToRetool:
    enabled: false
...

Specify telemetry version

The telemetry image uses the same release version as the main backend by default. If necessary, you can specify a version tag to use using the image.tag option:

telemetry:
  image:
    tag: 3.52.0-stable

If set, the telemetry image is fixed to the specified tag. Retool does not recommend including a tag unless you have a specific use case.

Collection and forwarding

The telemetry collector container contains two services: grafana-agent and vector. You can configure vector to send data to either Retool or to custom destinations.

telemetry collector uses a secure TLS connection with short-lived client certificates when sending data to Retool. Data is securely stored on Amazon S3 buckets in us-west-2 and not shared with any other third-parties or subprocessors.

Types of telemetry data

When enabled, your deployment produces the following types telemetry data:

Container Metrics (CPU, memory, network usage)
Retool Runtime Metrics (frontend performance, backend request counts and latency)
Container Logs (request logs, error logs, info logs)

Source name	Sent to Retool	Description
`metrics_statsd`		Retool internal metrics. This includes frontend performance, backend request count, latency, etc.
`metrics_statsd_raw`		Same as `metrics_statsd`, but without any identifying tags added by the telemetry collector.
`metrics`		All collected metrics. This includes container health metrics and all metrics from `metrics_statsd`.
`container_logs`		All logs from the containers in the Retool deployment, except `audit_logs` and `debug_logs` excluded and deployment identifying tags added.
`container_logs_raw`		All logs from the containers in the Retool deployment, without any exclusion, tagging, or other processing done.
`audit_logs`		Retool audit logs which are printed to container stdout, if any. Requires the relevant config to enable that feature.
`debug_logs`		Debug level logs, if any. These are separated so as to avoid accidentally forwarding high volumes of debug logs to destinations.

Send telemetry data to custom destinations

Retool supports sending to any custom destination supported by Vector. Refer to the Vector sinks reference documentation for a complete list of supported sink types and configuration.

You specify custom destinations using the customVectorConfig variable with sink configurations. Each sink must include a list of telemetry sources for which it forwards.

...
telemetry:
  customVectorConfig:
    sinks: ...
...

Example configuration for Datadog

The following example illustrates a telemetry configuration where data is forward to Retool and Datadog.

...
telemetry:
  extraEnv:
    - name: DD_AGENT_HOST
      valueFrom:
        fieldRef:
          fieldPath: status.hostIP
  customVectorConfig:
    sinks:
      # forward statsd metrics to datadog-agent port 8125
      metrics_datadog:
        address: ${DD_AGENT_HOST}:8125
        buffer:
          when_full: drop_newest
        inputs:
          - metrics_statsd_raw
        mode: udp
        type: statsd
  enabled: true
...

Example configuration for Prometheus Remote Write

The following example illustrates a telemetry configuration where data is forward to Retool and a Prometheus Remote Write destination.

...
telemetry:
  customVectorConfig:
    sinks:
      metrics_prometheus:
        type: prometheus_remote_write
        endpoint: https://prometheus:8087/api/v1/write
        inputs:
          - metrics
        buffer:
          when_full: drop_newest
  enabled: true
...

Configure self-managed observability agents

Configuring a self-managed agent depends on a number of factors. Use the following information to configure your agent for telemetry data collection.

Logs

Your agent should collect logs in the same way as any other container.

Format

Most Retool container logs are JSON-formatted. Retool recommends configuring your log collector to parse logs as JSON first, but falling back to a simple string message format upon failure.

Level

Set the LOG_LEVEL environment variable if you need to adjust log volume and verbosity.

Debugging

If you need to troubleshoot log collection, enable debug logs with the DEBUG environment variable. This will result in a very large number of logs. You should only use it for troubleshooting purposes.

Metrics

Not all emitted metrics are currently documented while telemetry data is in development as they may change in future versions. Current metrics cover various internal Retool runtime health metrics, such as frontend performance timings, resource query timings and error rates, internal cache sizes, workflow execution rates and timings, etc.

For more information about telemetry metrics, refer to the Temporal documentation.

main-backend container

Retool backend containers are instrumented to emit metrics in the DogStatsD format. To collect these metrics, you must configure the agent with a statsd UDP listener which is specifically DogStatsD-aware.

Set the STATSD_HOST environment variable to the IP or DNS name of your agent. If your agent is using a port other than 8125 (the default for most agents), set the STATSD_PORT environment variable to the correct port number.

workflow-worker container

In addition to statsd metrics , the workflow-worker container is also instrumented to also emit Temporal SDK metrics to an OpenTelemetry (OTLP) gRPC collector. Temporal SDK metrics can help you scale your deployment to keep up with Workflows traffic by tracking metrics such as queue latency.

If necessary, set the WORKFLOW_TEMPORAL_OPENTELEMETRY_COLLECTOR environment variable to the address of your OTLP gRPC endpoint. For example:

WORKFLOW_TEMPORAL_OPENTELEMETRY_COLLECTOR=http://localhost:4317

Tracing

Retool supports Datadog for collecting backend traces. If you use the Datadog agent, set the DD_TRACING_ENABLED environment variable to true.

In some cases, you may need to configure additional trace collection options, such as:

Setting the trace agent hostname with the DD_TRACE_AGENT_HOSTNAME environment variable.
Adjusting the sample rate with the DD_TRACE_SAMPLE_RATE environment variable.

Retool uses the dd-trace library and supports all available configuration parameters.

Configure Retool telemetry collector​

Collection and forwarding​

Types of telemetry data​

Send telemetry data to custom destinations​

Example configuration for Datadog​

Example configuration for Prometheus Remote Write​

Configure self-managed observability agents​

Logs​

Format​

Level​

Debugging​

Metrics​

main-backend container​

workflow-worker container​

Tracing​