Skip to main content

Scale your self-hosted deployment infrastructure

Retool's self-hosted Docker image consists of several containers required to run your deployment. Before you continue, you should review Retool's self-hosted architecture to understand which services need scaling as usage increases.

Scale your Retool deployment

Retool requires you to self-host production deployments on Kubernetes with Helm. For new deployments, Retool provides Terraform blueprints to provision all required infrastructure and deploy automatically. For non-production testing and development, you can use Docker.

Retool is packaged as a single stateless Docker container. The only dependency is PostgreSQL (version 13 or later) for the platform database which is used to store data such as user information, organization settings, audit logs, and applications.

To scale a Retool instance, you follow these high-level steps:

  1. Host the PostgreSQL database on an external system, e.g., AWS RDS. The separation of database and application allows you to independently scale and manage each service.
  2. Start multiple Retool containers that use the same Postgres database. The number of containers you deploy depends on your traffic and resource requirements. We recommend scaling the api container. You can do this by updating the replica count in your container orchestration service of choice.
  3. Use a load balancer to route traffic between the Retool server containers to ensure high availability.

You should only run one replica of the jobs-runner in each Retool environment because it runs database migrations and other background tasks that should only operate as a singleton. You can scale up as many replicas as necessary of the containers/pods that run the other service types.

Agent sandbox pool

The agent sandbox is the only Retool service that autoscales without manual configuration. The agent-sandbox-controller provisions and recycles short-lived agent-sandbox-job pods as editor sessions open and close.

How it scales

Each active editing session requires approximately one agent-sandbox-job pod. When a builder opens an app, the controller assigns a pod from the prewarm pool or provisions a new one. When the session ends, the pod is recycled.

ComponentCPUMemoryReplicas
agent-sandbox-controller¼ core256 MiB1
agent-sandbox-proxy¼ core256 MiB1
agent-sandbox-job pods1 core2 GiB~1 per active editor

The controller and proxy do not need to be replicated.

Prewarm pool

A pool of idle pods is kept ready to eliminate cold-start latency when builders open new sessions. The default is 5 pods:

values.yaml
agentSandbox:
controller:
scaling:
prewarmPoolSize: 5

Set prewarmPoolSize to 0 if startup latency is acceptable and reducing idle resource consumption is a priority. Builders will still get a sandbox session, but they'll wait for a pod to provision, which can take up to a minute or more depending on your environment.

Node autoscaling

The sandbox controller scales the pod pool automatically, but it can only provision pods up to your cluster's current node capacity. For the sandbox to scale beyond existing node resources, your cluster needs node autoscaling enabled:

  • Amazon EKS: Cluster Autoscaler or Karpenter. Karpenter requires additional configuration; refer to the cloud-specific notes in the Kubernetes deployment guide.
  • GKE: GKE cluster autoscaler or Autopilot.
  • AKS: AKS cluster autoscaler.

Without node autoscaling, the sandbox pool is limited by your current cluster capacity. If that ceiling is reached, new editing sessions won't get a working sandbox until capacity is freed or additional nodes are provisioned.

Capacity planning

To estimate sandbox resource requirements at peak usage:

  1. Count the maximum number of builders you expect to be simultaneously active.
  2. Add prewarmPoolSize (default: 5) for idle pool overhead.
  3. Multiply by per-pod resources: 1 CPU core and 2 GiB memory.

For example, 20 concurrent editors with the default prewarm pool size requires capacity for 25 pods (25 CPU cores and 50 GiB memory for the sandbox alone).

The default maximum total sandbox job pods is 50 (agentSandbox.controller.scaling.maxTotalJobs). If your calculation exceeds this, raise the cap explicitly; otherwise the pool will silently stop growing and editors beyond the cap won't get a working sandbox session:

values.yaml
agentSandbox:
controller:
scaling:
maxTotalJobs: 75 # raise from default 50 as needed

On Docker Compose, the sandbox pool is limited by the host VM's total memory and CPU. There is no node autoscaling on a single VM. If you expect more than a small number of concurrent editors, plan to size the VM accordingly or migrate to a Kubernetes deployment.

Concurrency limits

Each agent thread runs in a dedicated sandbox session (pod). A user can have up to 5 concurrent threads open by default, which means up to 5 sandbox sessions held simultaneously. This per-user limit is separate from the deployment-wide pod cap set by agentSandbox.controller.scaling.maxTotalJobs in values.yaml.

When all of a user's sessions are active and they open a new thread, the thread fails to start rather than evicting an existing live session. If at least one session is idle, the system automatically reclaims the oldest idle session and assigns it to the new thread. The reclamation happens in the background and does not interrupt active work.

Idle timeout

Sandboxes shut down automatically after a configurable idle period. The default is 2.5 hours. When a session times out, Retool releases the reserved CPU and memory back to the pool.

In high-traffic deployments where usage patterns are unpredictable, reducing the idle timeout to 30 minutes or less helps reclaim resources faster.

Frontend caching with a service worker

Retool installs a service worker in the browser that caches the application frontend and serves it on subsequent page loads. The service worker uses a stale-while-revalidate strategy: it returns the cached version of index.html immediately, then requests the latest version from Retool in the background. If the cached and latest versions differ, Retool automatically refreshes the page so users see the current build.

The cache is version-aware and automatically invalidates when you upgrade Retool. After an upgrade, users land on the previously cached version once, then receive the new version on the next request without needing to clear their browser cache. Users can bypass the cache with a hard refresh (Cmd+Shift+R / Ctrl+Shift+R).

The service worker is enabled by default in self-hosted Retool 3.334 and later. Admins can disable it from Settings > Beta if needed to diagnose page load issues.

Near-zero downtime upgrades

All MAJOR.MINOR release upgrades, such upgrading from 3.334 to 4.0, perform database migrations. Patch release upgrades within the same MAJOR.MINOR release, such as 4.0.1 to 4.0.2, generally do not.

General upgrade strategy

Each container or pod attempts to perform database migrations at startup after an upgrade. In multi-container setups, upgrades must be coordinated carefully to prevent container deadlocks and minimize downtime.

Database migrations are backward-compatible so older containers or pods can continue serving traffic while these migrations run.

  1. Stop all containers except jobs-runner.
  2. Start a new jobs-runner container using the latest MAJOR.MINOR.PATCH image tag, such as 4.0.1-stable.
  3. Wait for the database migrations to complete.
  4. Stop the old jobs-runner container.
  5. Start the remaining containers using the new image.
  6. Verify all containers are healthy before routing traffic.