Skip to main content

Collect self-hosted telemetry data

Learn how to collect telemetry data from your self-hosted deployment instance.

This feature is in development. It is currently available for self-hosted instances deployed using retool-helm 6.2.0 or above, and running the most recent Edge or Stable release.

Organizations with Self-hosted deployment instances can collect telemetry data using either:

Retool's observability agent can forward data to both Retool and custom destinations. Retool also supports using your own observability agent if you prefer to have complete control over telemetry data collection.

Telemetry data collection is not enabled by default. You must configure your deployment instance to start collecting and forwarding telemetry data.

Configure Retool telemetry collector

When telemetry data is forwarded to Retool, your deployment's health is continually monitored. This allows Retool to have more insight into potential issues and improves the level of support when diagnosing issues.

Use the Helm CLI or update your Helm configuration file to enable telemetry collection. This will send data to Retool by default. Set sendToRetool to false to disable this if you do not want to send data to Retool.

helm upgrade --set telemetry.enabled=true telemetry.sendToRetool.enabled=false...
Specify telemetry version

The telemetry image uses the same release version as the main backend by default. If necessary, you can specify a version tag to use using the image.tag option:

telemetry:
image:
tag: 3.52.0-stable

If set, the telemetry image is fixed to the specified tag. Retool does not recommend including a tag unless you have a specific use case.

Collection and forwarding

The telemetry collector container contains two services: grafana-agent and vector. You can configure vector to send data to either Retool or to custom destinations.

telemetry collector uses a secure TLS connection with short-lived client certificates when sending data to Retool. Data is securely stored on Amazon S3 buckets in us-west-2 and not shared with any other third-parties or subprocessors.

Types of telemetry data

When enabled, your deployment produces the following types telemetry data:

  • Container Metrics (CPU, memory, network usage)
  • Retool Runtime Metrics (frontend performance, backend request counts and latency)
  • Container Logs (request logs, error logs, info logs)
Source nameSent to RetoolDescription
metrics_statsdRetool internal metrics. This includes frontend performance, backend request count, latency, etc.
metrics_statsd_rawSame as metrics_statsd, but without any identifying tags added by the telemetry collector.
metricsAll collected metrics. This includes container health metrics and all metrics from metrics_statsd.
container_logsAll logs from the containers in the Retool deployment, except audit_logs and debug_logs excluded and deployment identifying tags added.
container_logs_rawAll logs from the containers in the Retool deployment, without any exclusion, tagging, or other processing done.
audit_logsRetool audit logs which are printed to container stdout, if any. Requires the relevant config to enable that feature.
debug_logsDebug level logs, if any. These are separated so as to avoid accidentally forwarding high volumes of debug logs to destinations.

Send telemetry data to custom destinations

Retool supports sending to any custom destination supported by Vector. Refer to the Vector sinks reference documentation for a complete list of supported sink types and configuration.

You specify custom destinations using the customVectorConfig variable with sink configurations. Each sink must include a list of telemetry sources for which it forwards.

...
telemetry:
customVectorConfig:
sinks: ...
...

Example configuration for Datadog

The following example illustrates a telemetry configuration where data is forward to Retool and Datadog.

...
telemetry:
extraEnv:
- name: DD_AGENT_HOST
valueFrom:
fieldRef:
fieldPath: status.hostIP
customVectorConfig:
sinks:
# forward statsd metrics to datadog-agent port 8125
metrics_datadog:
address: ${DD_AGENT_HOST}:8125
buffer:
when_full: drop_newest
inputs:
- metrics_statsd_raw
mode: udp
type: statsd
enabled: true
...

Example configuration for Prometheus Remote Write

The following example illustrates a telemetry configuration where data is forward to Retool and a Prometheus Remote Write destination.

...
telemetry:
customVectorConfig:
sinks:
metrics_prometheus:
type: prometheus_remote_write
endpoint: https://prometheus:8087/api/v1/write
inputs:
- metrics
buffer:
when_full: drop_newest
enabled: true
...

Configure self-managed observability agents

Configuring a self-managed agent depends on a number of factors. Use the following information to configure your agent for telemetry data collection.

Logs

Your agent should collect logs in the same way as any other container.

Format

Most Retool container logs are JSON-formatted. Retool recommends configuring your log collector to parse logs as JSON first, but falling back to a simple string message format upon failure.

Level

Set the LOG_LEVEL environment variable if you need to adjust log volume and verbosity.

Debugging

If you need to troubleshoot log collection, enable debug logs with the DEBUG environment variable. This will result in a very large number of logs. You should only use it for troubleshooting purposes.

Metrics

Not all emitted metrics are currently documented while telemetry data is in development as they may change in future versions. Current metrics cover various internal Retool runtime health metrics, such as frontend performance timings, resource query timings and error rates, internal cache sizes, workflow execution rates and timings, etc.

For more information about telemetry metrics, refer to the Temporal documentation.

main-backend container

Retool backend containers are instrumented to emit metrics in the DogStatsD format. To collect these metrics, you must configure the agent with a statsd UDP listener which is specifically DogStatsD-aware.

Set the STATSD_HOST environment variable to the IP or DNS name of your agent. If your agent is using a port other than 8125 (the default for most agents), set the STATSD_PORT environment variable to the correct port number.

workflow-worker container

In addition to statsd metrics , the workflow-worker container is also instrumented to also emit Temporal SDK metrics to an OpenTelemetry (OTLP) gRPC collector. Temporal SDK metrics can help you scale your deployment to keep up with Workflows traffic by tracking metrics such as queue latency.

If necessary, set the WORKFLOW_TEMPORAL_OPENTELEMETRY_COLLECTOR environment variable to the address of your OTLP gRPC endpoint. For example:

WORKFLOW_TEMPORAL_OPENTELEMETRY_COLLECTOR=http://localhost:4317

Tracing

Retool supports Datadog for collecting backend traces. If you use the Datadog agent, set the DD_TRACING_ENABLED environment variable to true.

In some cases, you may need to configure additional trace collection options, such as:

  • Setting the trace agent hostname with the DD_TRACE_AGENT_HOSTNAME environment variable.
  • Adjusting the sample rate with the DD_TRACE_SAMPLE_RATE environment variable.

Retool uses the dd-trace library and supports all available configuration parameters.