OCI Instance Pools and Autoscaling: Building a Production-Grade Compute Scaling Architecture with Terraform

Vertical scaling on OCI is straightforward: stop the instance, change the shape, start it again. It works but it does not solve the problem you face at 9am on a Monday when traffic doubles in ten minutes and you need twenty more instances, not one bigger one. That is horizontal scaling, and doing it properly on OCI requires understanding how Instance Configurations, Instance Pools, and Autoscaling Configurations work together.

Most teams get to instance pools quickly. They read the docs, create a pool with a fixed size, and think they are done. What they miss is the autoscaling layer on top, the load balancer backend set attachment that makes the pool actually serve traffic, the health check configuration that removes unhealthy instances before they receive requests, and the custom metric path that scales on application-level signals instead of just CPU.

This post covers all of it: the full Terraform implementation of a production autoscaling group behind a load balancer, health checks, scaling policies using both metric-based and schedule-based triggers, and custom metric publishing so you can scale on queue depth or request latency instead of raw CPU utilization.

How the Components Fit Together

Before writing any Terraform, the relationship between the three core resources matters.

An Instance Configuration is a template. It defines the compute shape, the OS image, the boot volume size, the VCN subnet placement, the cloud-init script, and any attached block volumes. The Instance Configuration itself does not run anything. It is a snapshot of how an instance should be created.

An Instance Pool uses that template to create and manage a group of identically configured instances. The pool maintains a target size, handles replacements when an instance becomes unhealthy, and integrates with the OCI Load Balancer to register and deregister instances automatically as they join or leave the pool.

An Autoscaling Configuration sits on top of the pool and adjusts the target size based on rules you define. It can scale out when CPU exceeds a threshold, scale in when it drops, and follow a fixed schedule for predictable load patterns.

Step 1: Instance Configuration

The cloud-init script inside the instance configuration is where you install your application, configure the OCI Monitoring agent for custom metrics, and register the instance with your configuration management system. Keep it idempotent.

			
data "oci_core_images" "ol8_image" {
  compartment_id           = var.compartment_id
  operating_system         = "Oracle Linux"
  operating_system_version = "8"
  shape                    = "VM.Standard.E4.Flex"
  sort_by                  = "TIMECREATED"
  sort_order               = "DESC"
  filter {
    name   = "display_name"
    values = ["^.*Oracle-Linux-8.*$"]
    regex  = true
  }
}
resource "oci_core_instance_configuration" "app_instance_config" {
  compartment_id = var.compartment_id
  display_name   = "orders-api-instance-config-v${var.app_version}"
  instance_details {
    instance_type = "compute"
    launch_details {
      compartment_id = var.compartment_id
      display_name   = "orders-api-node"
      shape          = "VM.Standard.E4.Flex"
      shape_config {
        ocpus         = 2
        memory_in_gbs = 16
      }
      source_details {
        source_type             = "image"
        image_id                = data.oci_core_images.ol8_image.images[0].id
        boot_volume_size_in_gbs = 50
      }
      create_vnic_details {
        subnet_id             = var.app_subnet_id
        assign_public_ip      = false
        nsg_ids               = [var.app_nsg_id]
        hostname_label_prefix = "orders-api"
      }
      metadata = {
        ssh_authorized_keys = var.ssh_public_key
        user_data           = base64encode(templatefile("${path.module}/templates/cloud-init.yaml", {
          app_version        = var.app_version
          compartment_id     = var.compartment_id
          region             = var.region
          monitoring_enabled = "true"
        }))
      }
      defined_tags = {
        "Operations.Environment" = "production"
        "Operations.Application" = "orders-api"
        "Operations.ManagedBy"   = "terraform"
      }
    }
  }
}

		

The cloud-init template at templates/cloud-init.yaml:

			
#cloud-config
runcmd:
  # Install OCI Unified Monitoring Agent for custom metrics
  - dnf install -y oracle-cloud-agent
  - systemctl enable oracle-cloud-agent
  - systemctl start oracle-cloud-agent
  # Install the application
  - mkdir -p /opt/orders-api
  - dnf install -y python3.11 python3.11-pip
  - pip3.11 install orders-api==${app_version}
  # Configure the application
  - |
    cat > /etc/orders-api/config.yaml <<EOF
    environment: production
    compartment_id: ${compartment_id}
    region: ${region}
    metrics_namespace: custom_orders_api
    EOF
  # Start the application
  - systemctl enable orders-api
  - systemctl start orders-api
write_files:
  - path: /etc/systemd/system/orders-api.service
    content: |
      [Unit]
      Description=Orders API Service
      After=network.target
      [Service]
      Type=simple
      User=app
      ExecStart=/usr/local/bin/orders-api serve
      Restart=always
      RestartSec=5
      Environment=CONFIG_FILE=/etc/orders-api/config.yaml
      [Install]
      WantedBy=multi-user.target

		

Step 2: Instance Pool

			
resource "oci_core_instance_pool" "orders_api_pool" {
  compartment_id            = var.compartment_id
  instance_configuration_id = oci_core_instance_configuration.app_instance_config.id
  display_name              = "orders-api-pool"
  size                      = 2
  placement_configurations {
    availability_domain = data.oci_identity_availability_domains.ads.availability_domains[0].name
    primary_subnet_id   = var.app_subnet_id
    fault_domains       = ["FAULT-DOMAIN-1", "FAULT-DOMAIN-2", "FAULT-DOMAIN-3"]
  }
  placement_configurations {
    availability_domain = data.oci_identity_availability_domains.ads.availability_domains[1].name
    primary_subnet_id   = var.app_subnet_id_ad2
    fault_domains       = ["FAULT-DOMAIN-1", "FAULT-DOMAIN-2", "FAULT-DOMAIN-3"]
  }
  load_balancers {
    backend_set_name = oci_load_balancer_backend_set.orders_api_backend.name
    load_balancer_id = oci_load_balancer.orders_lb.id
    port             = 8080
    vnic_selection   = "PrimaryVnic"
  }
  defined_tags = {
    "Operations.Environment" = "production"
    "Operations.Application" = "orders-api"
  }
}

		

Two placement configurations across two availability domains with all three fault domains specified in each. This spreads instances evenly across the physical failure domains within each AD. A single hardware failure affecting one fault domain takes out at most one third of your capacity in one AD, not all of it.

The load_balancers block registers the pool with the load balancer backend set automatically. When the pool adds an instance, OCI registers it with the backend set. When it removes one, OCI deregisters it before terminating the instance so it drains connections cleanly.

Step 3: Load Balancer and Health Check

			
resource "oci_load_balancer" "orders_lb" {
  compartment_id             = var.compartment_id
  display_name               = "orders-api-lb"
  shape                      = "flexible"
  subnet_ids                 = [var.public_subnet_id]
  is_private                 = false
  shape_details {
    minimum_bandwidth_in_mbps = 10
    maximum_bandwidth_in_mbps = 400
  }
  defined_tags = {
    "Operations.Environment" = "production"
    "Operations.Application" = "orders-api"
  }
}
resource "oci_load_balancer_backend_set" "orders_api_backend" {
  name             = "orders-api-backend-set"
  load_balancer_id = oci_load_balancer.orders_lb.id
  policy           = "LEAST_CONNECTIONS"
  health_checker {
    protocol            = "HTTP"
    port                = 8080
    url_path            = "/health"
    interval_ms         = 10000
    timeout_in_millis   = 3000
    retries             = 3
    return_code         = 200
    response_body_regex = ".*\"status\":\"healthy\".*"
  }
  session_persistence_configuration {
    cookie_name      = "orders_session"
    disable_fallback = false
  }
}
resource "oci_load_balancer_listener" "orders_https" {
  load_balancer_id         = oci_load_balancer.orders_lb.id
  name                     = "orders-https-listener"
  default_backend_set_name = oci_load_balancer_backend_set.orders_api_backend.name
  port                     = 443
  protocol                 = "HTTP"
  ssl_configuration {
    certificate_name        = oci_load_balancer_certificate.orders_cert.certificate_name
    verify_peer_certificate = false
    protocols               = ["TLSv1.2", "TLSv1.3"]
    cipher_suite_name       = "oci-wider-compatible-ssl-cipher-suite-v1"
  }
  connection_configuration {
    idle_timeout_in_seconds = 60
  }
}

		

The health checker uses response_body_regex to validate the response body, not just the HTTP status code. Your /health endpoint should return a JSON payload that confirms the application is ready to serve traffic, not just that the process is running. A process can be alive but unable to connect to the database, which makes it unhealthy from a request-serving perspective even though it returns 200.

Step 4: Metric-Based Autoscaling

The default metric-based autoscaling policy uses CPU utilization. This works for CPU-bound workloads but misses the mark for I/O-bound services where CPU stays low while request queues build up.

			
resource "oci_autoscaling_auto_scaling_configuration" "orders_api_asc" {
  compartment_id       = var.compartment_id
  display_name         = "orders-api-autoscaling"
  is_enabled           = true
  cool_down_in_seconds = 300
  auto_scaling_resources {
    id   = oci_core_instance_pool.orders_api_pool.id
    type = "instancePool"
  }
  policies {
    display_name = "cpu-scale-out"
    policy_type  = "threshold"
    capacity {
      initial = 2
      min     = 2
      max     = 20
    }
    rules {
      display_name = "scale-out-on-high-cpu"
      action {
        type  = "CHANGE_COUNT_BY"
        value = 2
      }
      metric {
        metric_type = "CPU_UTILIZATION"
        threshold {
          operator = "GT"
          value    = 75
        }
      }
    }
    rules {
      display_name = "scale-in-on-low-cpu"
      action {
        type  = "CHANGE_COUNT_BY"
        value = -1
      }
      metric {
        metric_type = "CPU_UTILIZATION"
        threshold {
          operator = "LT"
          value    = 25
        }
      }
    }
  }
}

		

The cool_down_in_seconds = 300 prevents the autoscaler from firing again within five minutes of the last scaling action. Without this, a sudden traffic spike triggers a scale-out, the new instances come online, CPU drops, a scale-in fires immediately, the instances terminate, CPU climbs again, and you get an oscillation loop. Five minutes gives new instances time to warm up and take on load before the next evaluation.

Scale out by 2, scale in by 1. Always scale out faster than you scale in. The cost of having one extra instance for a few minutes is trivial compared to the cost of serving degraded traffic because you removed capacity too aggressively.

Step 5: Custom Metric Autoscaling

CPU-based scaling is not enough for most production services. A better signal is often active request queue depth or response latency percentile. If your application publishes custom metrics to OCI Monitoring, you can scale on those instead.

Here is how the application publishes a custom metric from Python:

python

			
import oci
import json
from datetime import datetime, timezone
def publish_queue_depth_metric(queue_depth: int, compartment_id: str):
    config = oci.config.from_file()
    monitoring_client = oci.monitoring.MonitoringClient(
        config,
        service_endpoint="https://telemetry-ingestion.{}.oraclecloud.com".format(config["region"])
    )
    metric_data = oci.monitoring.models.PostMetricDataDetails(
        metric_data=[
            oci.monitoring.models.MetricDataDetails(
                namespace="custom_orders_api",
                compartment_id=compartment_id,
                name="RequestQueueDepth",
                dimensions={
                    "environment": "production",
                    "application": "orders-api"
                },
                datapoints=[
                    oci.monitoring.models.Datapoint(
                        timestamp=datetime.now(timezone.utc),
                        value=float(queue_depth)
                    )
                ],
                metadata={
                    "unit": "count",
                    "displayName": "Request Queue Depth"
                }
            )
        ]
    )
    response = monitoring_client.post_metric_data(
        post_metric_data_details=metric_data
    )
    return response.status

		

Call this function every 60 seconds from a background thread in your application. Once the metric appears in OCI Monitoring under the custom_orders_api namespace, you can create an autoscaling rule against it.

OCI’s native autoscaling configuration only supports CPU_UTILIZATION and MEMORY_UTILIZATION as built-in metric types. To scale on a custom metric you pair OCI Monitoring alarms with an OCI Functions trigger that calls the Instance Pool resize API directly.

			
resource "oci_monitoring_alarm" "queue_depth_high" {
  compartment_id        = var.compartment_id
  display_name          = "orders-api-queue-depth-high"
  is_enabled            = true
  metric_compartment_id = var.compartment_id
  namespace             = "custom_orders_api"
  query                 = "RequestQueueDepth[1m]{environment = 'production'}.mean() > 500"
  severity              = "WARNING"
  pending_duration      = "PT2M"
  destinations          = [oci_ons_notification_topic.scaling_topic.id]
  body                  = "Queue depth exceeded 500 for 2 minutes. Scaling out instance pool."
}
resource "oci_monitoring_alarm" "queue_depth_low" {
  compartment_id        = var.compartment_id
  display_name          = "orders-api-queue-depth-low"
  is_enabled            = true
  metric_compartment_id = var.compartment_id
  namespace             = "custom_orders_api"
  query                 = "RequestQueueDepth[5m]{environment = 'production'}.mean() < 100"
  severity              = "INFO"
  pending_duration      = "PT10M"
  destinations          = [oci_ons_notification_topic.scaling_topic.id]
  body                  = "Queue depth below 100 for 10 minutes. Scaling in instance pool."
}
resource "oci_ons_notification_topic" "scaling_topic" {
  compartment_id = var.compartment_id
  name           = "orders-api-scaling-events"
  description    = "Triggers custom metric scaling function"
}
resource "oci_ons_subscription" "scaling_function_sub" {
  compartment_id = var.compartment_id
  topic_id       = oci_ons_notification_topic.scaling_topic.id
  protocol       = "ORACLE_FUNCTIONS"
  endpoint       = oci_functions_function.pool_scaler.id
}

		

The OCI Function that handles the scaling action:

			
import io
import json
import oci
import logging
logger = logging.getLogger()
POOL_ID       = "ocid1.instancepool.oc1..."
MIN_SIZE      = 2
MAX_SIZE      = 20
SCALE_OUT_BY  = 2
SCALE_IN_BY   = 1
def handler(ctx, data: io.BytesIO = None):
    try:
        body = json.loads(data.getvalue())
        alarm_body = body.get("body", "")
        logger.info(f"Received alarm notification: {alarm_body}")
    except Exception as ex:
        logger.error(f"Failed to parse notification: {ex}")
        return
    signer = oci.auth.signers.get_resource_principals_signer()
    compute_mgmt = oci.core.ComputeManagementClient(config={}, signer=signer)
    pool = compute_mgmt.get_instance_pool(POOL_ID).data
    current_size = pool.size
    if "Scaling out" in alarm_body:
        new_size = min(current_size + SCALE_OUT_BY, MAX_SIZE)
        action = "scale-out"
    elif "Scaling in" in alarm_body:
        new_size = max(current_size - SCALE_IN_BY, MIN_SIZE)
        action = "scale-in"
    else:
        logger.info("Unrecognized alarm body, no action taken")
        return
    if new_size == current_size:
        logger.info(f"Already at {'max' if action == 'scale-out' else 'min'} size ({current_size}), no action")
        return
    update_details = oci.core.models.UpdateInstancePoolDetails(size=new_size)
    compute_mgmt.update_instance_pool(POOL_ID, update_details)
    logger.info(f"Pool resize triggered: {current_size} to {new_size} ({action})")

		

The function uses get_resource_principals_signer() to authenticate with the Dynamic Group policy. No credentials are stored in the function configuration.

Step 6: Schedule-Based Scaling

For workloads with predictable patterns you can layer a schedule-based policy on top of the metric-based policy. Business-hours applications can scale up before the working day starts and scale down after it ends, reducing idle capacity costs at night.

			
resource "oci_autoscaling_auto_scaling_configuration" "orders_api_schedule" {
  compartment_id       = var.compartment_id
  display_name         = "orders-api-schedule-scaling"
  is_enabled           = true
  cool_down_in_seconds = 300
  auto_scaling_resources {
    id   = oci_core_instance_pool.orders_api_pool.id
    type = "instancePool"
  }
  policies {
    display_name = "business-hours-scale-up"
    policy_type  = "scheduled"
    execution_schedule {
      expression = "0 7 * * 0-4"
      timezone   = "Asia/Riyadh"
      type       = "cron"
    }
    capacity {
      initial = 6
      min     = 6
      max     = 20
    }
  }
  policies {
    display_name = "after-hours-scale-down"
    policy_type  = "scheduled"
    execution_schedule {
      expression = "0 20 * * 0-4"
      timezone   = "Asia/Riyadh"
      type       = "cron"
    }
    capacity {
      initial = 2
      min     = 2
      max     = 20
    }
  }
}

		

The cron expression 0 7 * * 0-4 fires at 07:00 Sunday through Thursday in the Asia/Riyadh timezone, which covers the standard working week in the Gulf region. At 20:00 the pool scales back to the minimum. The max remains at 20 in both schedules so metric-based scaling can still expand beyond the scheduled minimum during peak periods.

Step 7: Rolling Instance Configuration Update

When you need to deploy a new application version, you update the Instance Configuration and then replace pool instances without taking the pool offline.

			
# Create new instance configuration with updated app version
resource "oci_core_instance_configuration" "app_instance_config_v2" {
  compartment_id = var.compartment_id
  display_name   = "orders-api-instance-config-v${var.new_app_version}"
  # Same configuration as v1 with updated user_data referencing new_app_version
  instance_details {
    instance_type = "compute"
    launch_details {
      # ... identical to v1 except user_data references new_app_version
    }
  }
}
# Update the pool to use the new configuration
resource "oci_core_instance_pool" "orders_api_pool" {
  instance_configuration_id = oci_core_instance_configuration.app_instance_config_v2.id
  # ... rest of pool config unchanged
}

		

Updating instance_configuration_id on the pool does not immediately replace running instances. Existing instances continue running with the old configuration. New instances added by scaling or manual pool resize use the new configuration. To replace all existing instances with the new version, trigger a rolling replacement using the OCI CLI:

			
oci compute-management instance-pool-instance attach \
  --instance-pool-id <pool-ocid> \
  --instance-id <instance-ocid>
# Or use the softreset action to trigger a rolling replace
oci compute-management instance-pool softreset \
  --instance-pool-id <pool-ocid>

		

The softreset action replaces instances one at a time, waiting for each new instance to pass the load balancer health check before terminating the next old instance. Zero downtime rolling deploy without any orchestration tooling.

Validating the Autoscaling Behavior

Generate artificial CPU load to test the scale-out policy:

			
# SSH into one of the pool instances via OCI Bastion
# Then stress the CPU
stress-ng --cpu 2 --cpu-load 90 --timeout 600

Watch the pool size change in real time:

			
watch -n 10 'oci compute-management instance-pool get \
  --instance-pool-id <pool-ocid> \
  --query "data.{size:size, state:\"lifecycle-state\"}" \
  --output table'

List all instances currently in the pool with their health status:

			
oci compute-management instance-pool-instance list \
  --instance-pool-id <pool-ocid> \
  --query 'data[*].{id:"id", state:"state", ad:"availability-domain", fault-domain:"fault-domain"}' \
  --output table

Check the autoscaling activity history to see every scale event with its trigger reason:

			
oci autoscaling auto-scaling-configuration list \
  --compartment-id <compartment-ocid> \
  --query 'data[*].{name:"display-name", enabled:"is-enabled"}' \
  --output table

Operational Notes

A few things that matter in production but are easy to miss.

The load balancer backend set policy is set to LEAST_CONNECTIONS. This distributes new connections to the instance with the fewest active connections rather than round-robin. For APIs with variable request duration, this prevents a slow request on one instance from causing it to accumulate a backlog while other instances are idle.

Instance Configuration versioning in Terraform requires care. The configuration resource name includes the version number (app_instance_config_v${var.app_version}), which means Terraform creates a new resource rather than modifying the existing one. This preserves the old configuration so you can roll back by pointing the pool back to the previous configuration OCID if the new version has problems.

The minimum pool size of 2 placed across two availability domains means you always have at least one instance in each AD. A complete outage of one availability domain still leaves the pool functional. Set your minimum to at least 2 and spread placement across ADs for any production workload.

Regards,
Osama

Share this:

Related

Published by Osama Mustafa

Leave a comment Cancel reply