Collect self-hosted telemetry data
Learn how to collect telemetry data from your self-hosted deployment instance.
This feature is in development. It is currently available for self-hosted instances deployed using retool-helm 6.2.0 or above, and running the most recent Edge or Stable release.
Organizations with Self-hosted deployment instances can collect telemetry data using either:
- A Retool-provided observability agent.
- A self-managed observability agent.
Retool's observability agent can forward data to both Retool and custom destinations. Retool also supports using your own observability agent if you prefer to have complete control over telemetry data collection.
Telemetry data collection is not enabled by default. You must configure your deployment instance to start collecting and forwarding telemetry data.
Configure Retool telemetry collector
When telemetry data is forwarded to Retool, your deployment's health is continually monitored. This allows Retool to have more insight into potential issues and improves the level of support when diagnosing issues.
Use the Helm CLI or update your Helm configuration file to enable telemetry collection. This will send data to Retool by default. Set sendToRetool
to false
to disable this if you do not want to send data to Retool.
- Help CLI
- Configuration file
helm upgrade --set telemetry.enabled=true telemetry.sendToRetool.enabled=false...
...
telemetry:
enabled: true
sendToRetool:
enabled: false
...
Specify telemetry version
The telemetry image uses the same release version as the main backend by default. If necessary, you can specify a version tag to use using the image.tag
option:
telemetry:
image:
tag: 3.52.0-stable
If set, the telemetry image is fixed to the specified tag. Retool does not recommend including a tag unless you have a specific use case.
Collection and forwarding
The telemetry collector container contains two services: grafana-agent and vector. You can configure vector
to send data to either Retool or to custom destinations.
telemetry collector uses a secure TLS connection with short-lived client certificates when sending data to Retool. Data is securely stored on Amazon S3 buckets in us-west-2
and not shared with any other third-parties or subprocessors.
Types of telemetry data
When enabled, your deployment produces the following types telemetry data:
- Container Metrics (CPU, memory, network usage)
- Retool Runtime Metrics (frontend performance, backend request counts and latency)
- Container Logs (request logs, error logs, info logs)
Source name | Sent to Retool | Description |
---|---|---|
metrics_statsd | Retool internal metrics. This includes frontend performance, backend request count, latency, etc. | |
metrics_statsd_raw | Same as metrics_statsd , but without any identifying tags added by the telemetry collector. | |
metrics | All collected metrics. This includes container health metrics and all metrics from metrics_statsd . | |
container_logs | All logs from the containers in the Retool deployment, except audit_logs and debug_logs excluded and deployment identifying tags added. | |
container_logs_raw | All logs from the containers in the Retool deployment, without any exclusion, tagging, or other processing done. | |
audit_logs | Retool audit logs which are printed to container stdout, if any. Requires the relevant config to enable that feature. | |
debug_logs | Debug level logs, if any. These are separated so as to avoid accidentally forwarding high volumes of debug logs to destinations. |
Send telemetry data to custom destinations
Retool supports sending to any custom destination supported by Vector. Refer to the Vector sinks reference documentation for a complete list of supported sink types and configuration.
You specify custom destinations using the customVectorConfig
variable with sink
configurations. Each sink must include a list of telemetry sources for which it forwards.
...
telemetry:
customVectorConfig:
sinks: ...
...
Example configuration for Datadog
The following example illustrates a telemetry configuration where data is forward to Retool and Datadog.
...
telemetry:
extraEnv:
- name: DD_AGENT_HOST
valueFrom:
fieldRef:
fieldPath: status.hostIP
customVectorConfig:
sinks:
# forward statsd metrics to datadog-agent port 8125
metrics_datadog:
address: ${DD_AGENT_HOST}:8125
buffer:
when_full: drop_newest
inputs:
- metrics_statsd_raw
mode: udp
type: statsd
enabled: true
...
Example configuration for Prometheus Remote Write
The following example illustrates a telemetry configuration where data is forward to Retool and a Prometheus Remote Write destination.
...
telemetry:
customVectorConfig:
sinks:
metrics_prometheus:
type: prometheus_remote_write
endpoint: https://prometheus:8087/api/v1/write
inputs:
- metrics
buffer:
when_full: drop_newest
enabled: true
...
Configure self-managed observability agents
Configuring a self-managed agent depends on a number of factors. Use the following information to configure your agent for telemetry data collection.
Logs
Your agent should collect logs in the same way as any other container.
Format
Most Retool container logs are JSON-formatted. Retool recommends configuring your log collector to parse logs as JSON first, but falling back to a simple string message format upon failure.
Level
Set the LOG_LEVEL
environment variable if you need to adjust log volume and verbosity.
Debugging
If you need to troubleshoot log collection, enable debug logs with the DEBUG
environment variable. This will result in a very large number of logs. You should only use it for troubleshooting purposes.
Metrics
Not all emitted metrics are currently documented while telemetry data is in development as they may change in future versions. Current metrics cover various internal Retool runtime health metrics, such as frontend performance timings, resource query timings and error rates, internal cache sizes, workflow execution rates and timings, etc.
For more information about telemetry metrics, refer to the Temporal documentation.
main-backend container
Retool backend containers are instrumented to emit metrics in the DogStatsD format. To collect these metrics, you must configure the agent with a statsd
UDP listener which is specifically DogStatsD-aware.
Set the STATSD_HOST
environment variable to the IP or DNS name of your agent. If your agent is using a port other than 8125
(the default for most agents), set the STATSD_PORT
environment variable to the correct port number.
workflow-worker container
In addition to statsd
metrics , the workflow-worker container is also instrumented to also emit Temporal SDK metrics to an OpenTelemetry (OTLP) gRPC collector. Temporal SDK metrics can help you scale your deployment to keep up with Workflows traffic by tracking metrics such as queue latency.
If necessary, set the WORKFLOW_TEMPORAL_OPENTELEMETRY_COLLECTOR
environment variable to the address of your OTLP gRPC endpoint. For example:
WORKFLOW_TEMPORAL_OPENTELEMETRY_COLLECTOR=http://localhost:4317
Tracing
Retool supports Datadog for collecting backend traces. If you use the Datadog agent, set the DD_TRACING_ENABLED
environment variable to true
.
In some cases, you may need to configure additional trace collection options, such as:
- Setting the trace agent hostname with the
DD_TRACE_AGENT_HOSTNAME
environment variable. - Adjusting the sample rate with the
DD_TRACE_SAMPLE_RATE
environment variable.
Retool uses the dd-trace library and supports all available configuration parameters.