Autoscaling

Autoscaling is a feature that automatically adjusts how many instances of your application are running, increasing capacity when demand rises, and reducing it when things are quiet. It helps your app stay responsive under heavy load while keeping your infrastructure costs efficient.

What is autoscaling?

Autoscaling works through horizontal scaling, by adding or removing whole application instances depending on resource usage. If CPU or memory utilization stays above a certain threshold for a set time, Upsun automatically adds more instances. If it stays low, Upsun removes unneeded ones. You control these thresholds and limits, so scaling always happens safely and predictably.

Scope: Available for applications only
Product tiers: Available for all Upsun Flex environments
Environments: Configurable per environment - across development, staging, and production

Know your app first

Autoscaling is quick to set up: you can enable it in a few clicks from your environment’s Configure resources tab. However, it’s important to understand your app’s typical performance before turning it on.

Tools like Blackfire can help you identify where your app consumes CPU or memory, so you can set realistic thresholds that reflect your traffic patterns. Blackfire can also help you spot whether autoscaling is likely to benefit your app or if a fixed setup with tuned vertical resources like CPU/RAM would serve you better.

When to use autoscaling

Autoscaling makes the most sense for workloads with variable or unpredictable traffic. It’s especially valuable when:

You run time-sensitive or customer-facing applications where latency matters.
Your app experiences seasonal or campaign-driven spikes.
You want to avoid paying for idle capacity during quieter periods.

Example: When autoscaling works effectively

A retail app sees traffic jump fivefold every Friday evening and during holiday campaigns. By enabling autoscaling, the app automatically adds instances when CPU usage rises and scales back overnight, ensuring smooth checkouts without wasted cost.

Example: When autoscaling might not be needed

An internal dashboard with predictable, low usage may not benefit from autoscaling. In this case, a fixed number of instances and tuned vertical resources (CPU/RAM) can be more cost-effective and stable.

Scale databases and resources

To vertically scale CPU, RAM, or disk, or horizontally scale applications and workers (manually), see:

Autoscaling availability

The tables below outline where autoscaling and manual scaling are supported, so you can plan your deployments with the right balance of flexibility and control.

Component support

Component	Horizontal autoscaling	Manual scaling (Vertical)
Applications (PHP, Node.js, etc.)	Available	Available
Services (MySQL, Redis, etc.)	Unavailable	Available
Queues (workers, background jobs)	Unavailable	Available

Product tier support

Product tier	Horizontal autoscaling	Manual scaling (Vertical)
Upsun Flex	Available	Available
Upsun Fixed	Unavailable	Available

Environment support

Environment	Horizontal Autoscaling	Manual scaling (Vertical)
Development	Available	Available
Staging	Available	Available
Production	Available	Available

Scaling trigger support

Trigger	Console
Average CPU (min/max)	Available
Average Memory (min/max)	Available

How autoscaling works

Thresholds

Autoscaling monitors the average CPU and memory usage of your running app instances.
You define thresholds that determine when new instances are launched or removed.

There are two different thresholds that your CPU and memory utilization operate within: A scale-up threshold and a scale-down threshold.

Scale-up threshold: If your chosen trigger (e.g. CPU usage) stays above this level for the time period you’ve set (the evaluation period), autoscaling will launch additional instances to share the load.
Scale-down threshold: If your chosen trigger stays below this level for the time period you’ve set, autoscaling will remove unneeded instances to save resources and costs.

To prevent unnecessary back-and-forth, autoscaling also uses a cooldown window: a short waiting period before another scaling action can be triggered. This can also be configured or kept to the default waiting period before any additional scaling starts.

Combined triggers

If both CPU and memory triggers are enabled, either one can initiate scaling. A global cooldown applies after each scaling event, but in rare cases, combined triggers may interact unexpectedly. For example, CPU scaling up followed by memory scaling down. Adjust thresholds and cooldowns carefully to avoid oscillation.

Default settings

Autoscaling continuously monitors the configured trigger across your app’s running instances. We will use the average CPU utilization trigger as the primary example for the default settings and examples below.

Scale-up threshold: 80% CPU for 5 minutes
Scale-down threshold: 20% CPU for 5 minutes
Cooldown window: 5 minutes between scaling actions
Instance limits: 1–8 per environment (region-dependent)
Instance limits and counts
- Default instance limits are typically 1–8 instances per environment, but the exact values depend on the region. Some regions may have higher or lower defaults. The scaling settings in your project always reflect the limits for the region where it runs.
- When autoscaling is enabled, manual instance count changes to apps are disabled. Vertical resources (CPU/RAM/disk per instance) remain configurable.
- To understand how an application’s instances are distributed, you can view the instance details in the project’s /run/peers.json file. To learn more, see View application instance details in the “Resource configuration” topic.

Default behaviour (CPU example)

If CPU stays at 80% or higher for 5 minutes, autoscaling adds an instance.
If CPU stays at 20% or lower for 5 minutes, autoscaling removes an instance.
After a scaling action, autoscaling waits 5 minutes before making another change.

This cycle ensures your app automatically scales up during high demand and scales down when demand drops, helping balance performance with cost efficiency.

Memory-based autoscaling

Autoscaling primarily relies on CPU utilization as its trigger, however you can also configure memory-based autoscaling, which works in a similar way, but with a few important differences to understand.

CPU-based triggers

CPU-based autoscaling reacts to sustained changes in average CPU utilization.

Scale-up threshold: When average CPU usage stays above your defined limit for the evaluation period, instances are added to distribute the load.
Scale-down threshold: When CPU usage remains below your lower limit for the evaluation period, instances are removed to save resources.
Cooldown window: A delay (default: 5 minutes) before another scaling action can occur.

Memory-based triggers

Memory-based autoscaling follows the same principle as CPU triggers but measures average memory utilization instead. When your app consistently uses more memory than your upper threshold, Upsun adds instances; when memory usage remains low, it removes them.

This option is useful for workloads where caching or in-memory data handling determine performance - for example, large data processing apps or services with persistent caching layers.

Example

Condition	Scaling action
Memory above 80% for 5 minutes	Scale up: Add one instance
Memory below 30% for 5 minutes	Scale down: Remove one instance

Understand your app’s memory profile

High memory usage doesn’t always mean your app needs more instances. Linux systems use available memory for caching and buffering, so 90–100% usage can be normal even under stable conditions. Before using memory-based autoscaling, profile your application’s typical memory behavior to avoid unnecessary scaling and extra cost.

Tools such as Blackfire or system-level metrics in your Application metrics dashboard can help you understand what “normal” looks like for your app.

Configure memory triggers

Open your project in the Console.
Select your target environment.
Choose Configure resources.
Under Autoscaling, select Enable (if not already enabled).
Choose Memory usage (min/max) as your scaling trigger.
Set scale-up and scale-down thresholds, evaluation period, and cooldown window.
Save changes — your app will now automatically scale based on memory utilization.

Guardrails and evaluation

Autoscaling gives you control over the minimum and maximum number of instances your app can run. These guardrails ensure your app never scales up or down too far. Set boundaries to keep scaling safe, predictable, and cost-efficient:

For example, you might configure:

Minimum instances — Ensures a minimum number of instances of the configured application are always running (e.g. 2)
Maximum instances — Prevents runaway scaling (e.g. 8)
Evaluation period — Time CPU must stay above or below a threshold before action (1–60 minutes)
Cooldown window — Wait time before any subsequent scaling action (default: 5 minutes)

Manual instance scaling

When autoscaling is enabled, manual instance scaling is disabled. Autoscaling manages instance counts within the min/max guardrails you define.

To understand how an application’s instances are distributed, you can view the instance details in the project’s /run/peers.json file. To learn more, see View application instance details in the “Resource configuration” topic.

Enable Autoscaling

To enable autoscaling, follow the steps below:

Open your project in the Console
Select the environment where you want to enable autoscaling
Choose Configure resources

Navigate to Configure resources in Console

Under the autoscaling column select Enable
Configure thresholds, evaluation period, cooldown, and instances as needed

Configure autoscaling in Console

Alerts and metrics

When autoscaling is enabled, the system continuously monitors metrics such as CPU usage, instance count, and request latency. If a defined threshold is crossed, an alert is triggered and the platform automatically responds by adjusting resources.

Scaling activity is visible in several places:

Metrics dashboards show when scaling has occurred
Alerts and scaling actions are also visible in the Console:
- Alerts appear with a bell icon (for example - Scaling: CPU for application below 70% for 5 minutes)
- Scaling actions appear with a resources icon (for example, Upscale: 1 instance added to application)
Alerts and scaling actions are also listed in the CLI as environment.alert and environment.resources.update
To review detailed scaling events, open the Resources dashboard by navigating to {Select project} > {Select environment} > Resources

Configure alerts

You can also configure notifications for alerts.

For example, by setting up an activity script on environment.alert, you can automatically send yourself an email, a Slack message, or another type of custom notification.

Metric resources

If you’re looking to keep track of your infrastructure and application metrics see:

Billing and cost impact

Autoscaling projects are billed for the resources that they consume. Instances added through autoscaling are billed the same as if you were to manually configure those resources.

Added instances are deployed automatically without downtime.

To control costs, avoid overly aggressive settings (e.g. very short evaluation periods).

Metric resources

If you’re looking to keep track of your billing see:

Best practices for autoscaling

Autoscaling gives you flexibility and resilience, but to get the best results it’s important to configure your app and thresholds thoughtfully. Below are some best practices to help you balance performance, stability, and cost.

Cost & stability

Set thresholds wisely: Configure realistic scale-up and scale-down thresholds to avoid unnecessary deployments.
Smooth spikes: Use longer evaluation periods (10–15 minutes) if your app traffic spikes often, to prevent rapid up-and-down scaling.
Control instance counts: Define minimum and maximum instances to manage costs while keeping required availability.
Monitor costs: Track billing and usage after enabling autoscaling, then adjust thresholds as needed.

Application design

External services: Use external services such as databases and caches instead of embedding them within the autoscaled applications.
Keep containers portable: Follow Upsun recommendations for caching and mounts for composable images and single-runtime images.

Cron jobs & long-running tasks

CPU spikes from jobs: Cron jobs can increase CPU usage and may trigger scale-ups so factor this into your threshold settings.
Job continuity: Cron jobs remain bound to their starting container and are not interrupted by scaling, so plan instances accordingly.

Supported services and actions

Autoscaling does not currently support queues or background worker services.

Scaling down to zero instances is also not supported. Use minimum instance counts to define your baseline availability.

Autoscaling

On this page

What is autoscaling?

Know your app first

When to use autoscaling

Example: When autoscaling works effectively

Example: When autoscaling might not be needed

Scale databases and resources

Autoscaling availability

Component support

Product tier support

Environment support

Scaling trigger support

How autoscaling works

Thresholds

Combined triggers

Default settings

Instance limits and counts

Default behaviour (CPU example)

Memory-based autoscaling

CPU-based triggers

Memory-based triggers

Example

Understand your app’s memory profile

Configure memory triggers

Guardrails and evaluation

Manual instance scaling

Enable Autoscaling

Alerts and metrics

Configure alerts

Metric resources

Billing and cost impact

Metric resources

Best practices for autoscaling

Cost & stability

Application design

Cron jobs & long-running tasks

Supported services and actions

Related documentation

Resources and scaling

Metrics

Image reference

Billing and payment