Scale your self-hosted deployment infrastructure
Learn how to scale your Retool deployment and infrastructure.
Retool's self-hosted Docker image consists of several containers required to run your deployment. Before you continue, you should review Retool's self-hosted architecture to understand which services need scaling as usage increases.
Scale your Retool deployment
Retool requires you to self-host production deployments on Kubernetes with Helm. For new deployments, Retool provides Terraform blueprints to provision all required infrastructure and deploy automatically. For non-production testing and development, you can use Docker.
Retool is packaged as a single stateless Docker container. The only dependency is PostgreSQL (version 13 or later) for the platform database which is used to store data such as user information, organization settings, audit logs, and applications.
To scale a Retool instance, you follow these high-level steps:
- Host the PostgreSQL database on an external system, e.g., AWS RDS. The separation of database and application allows you to independently scale and manage each service.
- Start multiple Retool containers that use the same Postgres database. The number of containers you deploy depends on your traffic and resource requirements. We recommend scaling the
apicontainer. You can do this by updating the replica count in your container orchestration service of choice. - Use a load balancer to route traffic between the Retool server containers to ensure high availability.
You should only run one replica of the jobs-runner in each Retool environment because it runs database migrations and other background tasks that should only operate as a singleton. You can scale up as many replicas as necessary of the containers/pods that run the other service types.
Agent sandbox pool
The agent sandbox is the only Retool service that autoscales without manual configuration. The agent-sandbox-controller provisions and recycles short-lived agent-sandbox-job pods as editor sessions open and close.
How it scales
Each active editing session requires approximately one agent-sandbox-job pod. When a builder opens an app, the controller assigns a pod from the prewarm pool or provisions a new one. When the session ends, the pod is recycled.
| Component | CPU | Memory | Replicas |
|---|---|---|---|
agent-sandbox-controller | ¼ core | 256 MiB | 1 |
agent-sandbox-proxy | ¼ core | 256 MiB | 1 |
agent-sandbox-job pods | 1 core | 2 GiB | ~1 per active editor |
The controller and proxy do not need to be replicated.
Prewarm pool
A pool of idle pods is kept ready to eliminate cold-start latency when builders open new sessions. The default is 5 pods:
agentSandbox:
controller:
scaling:
prewarmPoolSize: 5
Set prewarmPoolSize to 0 if startup latency is acceptable and reducing idle resource consumption is a priority. Builders will still get a sandbox session, but they'll wait for a pod to provision, which can take up to a minute or more depending on your environment.
Node autoscaling
The sandbox controller scales the pod pool automatically, but it can only provision pods up to your cluster's current node capacity. For the sandbox to scale beyond existing node resources, your cluster needs node autoscaling enabled:
- Amazon EKS: Cluster Autoscaler or Karpenter. Karpenter requires additional configuration; refer to the cloud-specific notes in the Kubernetes deployment guide.
- GKE: GKE cluster autoscaler or Autopilot.
- AKS: AKS cluster autoscaler.
Without node autoscaling, the sandbox pool is limited by your current cluster capacity. If that ceiling is reached, new editing sessions won't get a working sandbox until capacity is freed or additional nodes are provisioned.
Capacity planning
To estimate sandbox resource requirements at peak usage:
- Count the maximum number of builders you expect to be simultaneously active.
- Add
prewarmPoolSize(default: 5) for idle pool overhead. - Multiply by per-pod resources: 1 CPU core and 2 GiB memory.
For example, 20 concurrent editors with the default prewarm pool size requires capacity for 25 pods (25 CPU cores and 50 GiB memory for the sandbox alone).
The default maximum total sandbox job pods is 50 (agentSandbox.controller.scaling.maxTotalJobs). If your calculation exceeds this, raise the cap explicitly; otherwise the pool will silently stop growing and editors beyond the cap won't get a working sandbox session:
agentSandbox:
controller:
scaling:
maxTotalJobs: 75 # raise from default 50 as needed
On Docker Compose, the sandbox pool is limited by the host VM's total memory and CPU. There is no node autoscaling on a single VM. If you expect more than a small number of concurrent editors, plan to size the VM accordingly or migrate to a Kubernetes deployment.
Concurrency limits
Each agent thread runs in a dedicated sandbox session (pod). A user can have up to 5 concurrent threads open by default, which means up to 5 sandbox sessions held simultaneously. This per-user limit is separate from the deployment-wide pod cap set by agentSandbox.controller.scaling.maxTotalJobs in values.yaml.
When all of a user's sessions are active and they open a new thread, the thread fails to start rather than evicting an existing live session. If at least one session is idle, the system automatically reclaims the oldest idle session and assigns it to the new thread. The reclamation happens in the background and does not interrupt active work.
Idle timeout
Sandboxes shut down automatically after a configurable idle period. The default is 2.5 hours. When a session times out, Retool releases the reserved CPU and memory back to the pool.
In high-traffic deployments where usage patterns are unpredictable, reducing the idle timeout to 30 minutes or less helps reclaim resources faster.
Frontend caching with a service worker
Retool installs a service worker in the browser that caches the application frontend and serves it on subsequent page loads. The service worker uses a stale-while-revalidate strategy: it returns the cached version of index.html immediately, then requests the latest version from Retool in the background. If the cached and latest versions differ, Retool automatically refreshes the page so users see the current build.
The cache is version-aware and automatically invalidates when you upgrade Retool. After an upgrade, users land on the previously cached version once, then receive the new version on the next request without needing to clear their browser cache. Users can bypass the cache with a hard refresh (Cmd+Shift+R / Ctrl+Shift+R).
The service worker is enabled by default in self-hosted Retool 3.334 and later. Admins can disable it from Settings > Beta if needed to diagnose page load issues.
Near-zero downtime upgrades
All MAJOR.MINOR release upgrades, such upgrading from 3.334 to 4.0, perform database migrations. Patch release upgrades within the same MAJOR.MINOR release, such as 4.0.1 to 4.0.2, generally do not.
General upgrade strategy
Each container or pod attempts to perform database migrations at startup after an upgrade. In multi-container setups, upgrades must be coordinated carefully to prevent container deadlocks and minimize downtime.
Database migrations are backward-compatible so older containers or pods can continue serving traffic while these migrations run.
- Docker-based
- Kubernetes + Helm
- Kubernetes (without Helm)
- Stop all containers except
jobs-runner. - Start a new
jobs-runnercontainer using the latestMAJOR.MINOR.PATCHimage tag, such as4.0.1-stable. - Wait for the database migrations to complete.
- Stop the old
jobs-runnercontainer. - Start the remaining containers using the new image.
- Verify all containers are healthy before routing traffic.
Kubernetes + Helm deployments support rolling upgrades that simplify the process. First, use helm upgrade with the latest release tag. For example:
helm upgrade retool retool/retool \
--namespace retool \
--set image.tag=3.196.1-stable
During the upgrade process:
- A new
jobs-runnerpod is created and starts running migrations. - All other new pods start but wait for migrations to complete.
- Once all new pods are verified as healthy, the old pods are terminated automatically.
You can monitor the progress of database migrations for the job-runner pod using kubectl:
kubectl logs deployment/jobs-runner -n retool
For Kubernetes deployments without Helm, perform a manual rolling update.
- Patch the
jobs-runnerdeployment to use the new image tag. - Wait for the new pod to start and complete database migrations.
- Patch the remaining deployments to use the new image.
- Monitor rollout status and ensure all pods are healthy:
You can monitor the progress of database migrations for the job-runner pod using kubectl:
kubectl rollout status deployment/<deployment-name> -n retool