Autoscaling
Back to home
On this page
Autoscaling allows your applications to automatically scale horizontally based on resource usage.
This ensures your apps remain responsive under load while helping you optimize costs.
- Scope: Available for applications only
- Product tiers: Available for all Upsun Flex environments
- Environments: Configurable per environment - across development, staging, and production
Scale databases and resources
To vertically scale CPU, RAM, or disk, or horizontally scale applications and workers (manually), see:
Autoscaling availability
The tables below outline where autoscaling and manual scaling are supported, so you can plan your deployments with the right balance of flexibility and control.
Component support
Component | Horizontal autoscaling | Manual scaling (Vertical) |
---|---|---|
Applications (PHP, Node.js, etc.) | Available | Available |
Services (MySQL, Redis, etc.) | Unavailable | Available |
Queues (workers, background jobs) | Unavailable | Available |
Product tier support
Product tier | Horizontal autoscaling | Manual scaling (Vertical) |
---|---|---|
Upsun Flex | Available | Available |
Upsun Fixed | Unavailable | Available |
Environment support
Environment | Horizontal Autoscaling | Manual scaling (Vertical) |
---|---|---|
Development | Available | Available |
Staging | Available | Available |
Production | Available | Available |
Scaling trigger support
Trigger | Console |
---|---|
Average CPU (min/max) | Available |
Average Memory (min/max) | Coming |
How autoscaling works
Thresholds
Autoscaling continuously monitors the average CPU utilization across your app’s running instances. It works by you setting your thresholds, which are specific CPU usage levels that determine when autoscaling should take action. There are two different thresholds that your CPU utilization operates within: A scale-up threshold and a scale-down threshold.
-
Scale-up threshold: If your chosen trigger (e.g. CPU usage) stays above this level for the time period you’ve set (the evaluation period), autoscaling will launch additional instances to share the load.
-
Scale-down threshold: If your chosen trigger stays below this level for the time period you’ve set, autoscaling will remove unneeded instances to save resources and costs.
To prevent unnecessary back-and-forth, autoscaling also uses a cooldown window: a short waiting period before another scaling action can be triggered. This can also be configured or kept to the default waiting period before any additional scaling starts.
Default settings
Autoscaling continuously monitors the configured trigger across your app’s running instances. We will use the average CPU utilization trigger as the primary example for the default settings and examples below.
- Scale-up threshold: 80% CPU for 5 minutes
- Scale-down threshold: 20% CPU for 5 minutes
- Cooldown window: 5 minutes between scaling actions
- Instance limits: 1–8 per environment (region-dependent)
Instance limits
Default instance limits are typically 1–8 instances per environment, but the exact values depend on the region. Some regions may have higher or lower defaults. The scaling settings in your project always reflect the limits for the region where it runs.
Scale databases and resources
When autoscaling is enabled, manual instance count changes to apps are disabled. Vertical resources (CPU/RAM/disk per instance) remain configurable.
Default behaviour (CPU example)
- If CPU stays at 80% or higher for 5 minutes, autoscaling adds an instance.
- If CPU stays at 20% or lower for 5 minutes, autoscaling removes an instance.
- After a scaling action, autoscaling waits 5 minutes before making another change.
This cycle ensures your app automatically scales up during high demand and scales down when demand drops, helping balance performance with cost efficiency.
Guardrails and evaluation
Autoscaling gives you control over the minimum and maximum number of instances your app can run. These guardrails ensure your app never scales up or down too far. Set boundaries to keep scaling safe, predictable, and cost-efficient:
For example, you might configure:
- Minimum instances — Ensures a minimum number of instances of the configured application are always running (e.g.
2
) - Maximum instances — Prevents runaway scaling (e.g.
8
) - Evaluation period — Time CPU must stay above or below a threshold before action (1–60 minutes)
- Cooldown window — Wait time before any subsequent scaling action (default: 5 minutes)
Manual instance scaling
When autoscaling is enabled, manual instance scaling is disabled. Autoscaling manages instance counts within the min/max guardrails you define.
Enable Autoscaling
To enable autoscaling, follow the steps below:
- Open your project in the Console
- Select the environment where you want to enable autoscaling
- Choose Configure resources
- Under the autoscaling column select Enable
- Configure thresholds, evaluation period, cooldown, and instances as needed
Alerts and metrics
When autoscaling is enabled, the system continuously monitors metrics such as CPU usage, instance count, and request latency. If a defined threshold is crossed, an alert is triggered and the platform automatically responds by adjusting resources.
Scaling activity is visible in several places:
- Metrics dashboards show when scaling has occurred
- Alerts and scaling actions are also visible in the Console:
- Alerts appear with a bell icon (for example - Scaling: CPU for application below 70% for 5 minutes)
- Scaling actions appear with a resources icon (for example, Upscale: 1 instance added to application)
- Alerts and scaling actions are also listed in the CLI as
environment.alert
andenvironment.resources.update
- To review detailed scaling events, open the Resources dashboard by navigating to {Select project} > {Select environment} > Resources
Configure alerts
You can also configure notifications for alerts.
For example, by setting up an activity script on environment.alert
, you can automatically send yourself an email, a Slack message, or another type of custom notification.
Metric resources
If you’re looking to keep track of your infrastructure and application metrics see:
Billing and cost impact
Autoscaling projects are billed for the resources that they consume. Instances added through autoscaling are billed the same as if you were to manually configure those resources.
However, each scaling action consumes build minutes, since new or removed instances are deployed with scaling action. If your app scales frequently, this could increase build minute usage.
To control costs, avoid overly aggressive settings (e.g. very short evaluation periods).
Metric resources
If you’re looking to keep track of your billing see:
Best practices for autoscaling
Autoscaling gives you flexibility and resilience, but to get the best results it’s important to configure your app and thresholds thoughtfully. Below are some best practices to help you balance performance, stability, and cost.
Cost & stability
- Set thresholds wisely: Configure realistic scale-up and scale-down thresholds to avoid unnecessary deployments that quickly consume build minutes.
- Smooth spikes: Use longer evaluation periods (10–15 minutes) if your app traffic spikes often, to prevent rapid up-and-down scaling.
- Control instance counts: Define minimum and maximum instances to manage costs while keeping required availability.
- Monitor costs: Track billing and build minute usage after enabling autoscaling, then adjust thresholds as needed.
Application design
- External services: Use external services such as databases and caches instead of embedding them within the autoscaled applications.
- Keep containers portable: Follow Upsun recommendations for caching and mounts for composable images and single-runtime images.
Cron jobs & long-running tasks
- CPU spikes from jobs: Cron jobs can increase CPU usage and may trigger scale-ups so factor this into your threshold settings.
- Job continuity: Cron jobs remain bound to their starting container and are not interrupted by scaling, so plan instances accordingly.
Supported services and actions
Autoscaling does not currently support queues or background worker services.
Scaling down to zero instances is also not supported. Use minimum instance counts to define your baseline availability.