Horizontal Pod Autoscaler (HPA)

Overview

Horizontal Pod Autoscaler (HPA) automatically scales the number of pods based on observed CPU utilization or custom metrics. Orphelix provides real-time monitoring of HPA status, metrics, and scaling behavior. HPA List

What is HPA?

HPA automatically adjusts the number of pod replicas in a Deployment, StatefulSet, or ReplicaSet based on resource utilization:

Scale Up

Add more replicas when CPU/memory usage is high

Scale Down

Remove replicas when resource usage is low

Automatic

No manual intervention required

Cost Efficient

Pay only for resources you need

List View

Features

Real-time Metrics: Live CPU utilization percentages
Replica Counts: Current, min, and max replicas
Target Status: Whether HPA is meeting targets
Scaling Activity: Recent scale up/down events

Table Columns

Column	Description
Name	HPA name (clickable to view details)
Target	Resource being scaled (Deployment/StatefulSet)
Min/Max	Minimum and maximum replica limits
Current	Current number of replicas
CPU	Current CPU utilization vs target (e.g., “45% / 80%“)
Status	Scaling status indicator
Age	Time since HPA creation

Status Indicators

Active

HPA is monitoring and scaling normallyIndicator: Green badgeMeans:

Metrics are being collected
Scaling decisions are being made
Target resource is healthy

Unable to Scale

HPA cannot scale the targetIndicator: Red badgeCommon causes:

Target resource doesn’t exist
Metrics unavailable
Insufficient permissions
Target at max/min replicas

Scaling

HPA is actively scaling replicasIndicator: Yellow badgeMeans:

Scale up/down in progress
Waiting for new pods to become ready
Cooldown period active

Unknown

HPA status cannot be determinedIndicator: Gray badgePossible reasons:

Just created
Metrics server not available
Connection issues

Detail View

Click any HPA name to view comprehensive details:

Overview Section

Basic Information

Name: HPA identifier
Namespace: Current namespace
Target: Resource being scaled (with link)
Status: Current HPA status
Created: Creation timestamp

Replica Configuration

Min Replicas: Minimum pod count (never scale below)
Max Replicas: Maximum pod count (never scale above)
Current Replicas: Current number of running pods
Desired Replicas: Target replica count based on metrics

Metrics

Target Metric: CPU or custom metric name
Target Value: Threshold for scaling (e.g., 80% CPU)
Current Value: Actual measured value
Utilization %: Current as percentage of target

Scaling Metrics

CPU Utilization

Most common HPA metric:

Target Utilization
Current Utilization
Scaling Decision

Desired average CPU across all podsExample: 80%Calculation:

Average CPU across all pods should be ~80%

HPA calculates desired replicas:

desiredReplicas = ceil(currentReplicas * (currentMetric / targetMetric))

Example:

Current: 3 replicas at 120% CPU
Target: 80% CPU
Desired: ceil(3 * (120/80)) = ceil(4.5) = 5 replicas

CPU Progress Bar: CPU Utilization Bar

Current: 45% ████████░░░░░░░░░░ Target: 80%
Status: Under target - may scale down

Memory Utilization

HPA can also scale based on memory:

metrics:
- type: Resource
  resource:
    name: memory
    target:
      type: Utilization
      averageUtilization: 70

Memory-based scaling requires metrics-server to be running in the cluster

Custom Metrics

HPA supports custom metrics from:

Prometheus
Application metrics
Queue depth
Request rate

metrics:
- type: Pods
  pods:
    metric:
      name: http_requests_per_second
    target:
      type: AverageValue
      averageValue: "1000"

Scaling Behavior

View recent scaling activity:

Scale Up Events

When HPA added replicas:

Timestamp of scale up
Previous replica count
New replica count
Reason (CPU above target, etc.)

Scale Down Events

When HPA removed replicas:

Timestamp of scale down
Previous replica count
New replica count
Reason (CPU below target, etc.)

Cooldown Periods

HPA waits between scaling operations:

Scale Up: 3 minutes default
Scale Down: 5 minutes default

Prevents thrashing from metric fluctuations

Conditions

HPA conditions indicate scaling health:

Condition	Status	Meaning
AbleToScale	True/False	Can HPA scale the target?
ScalingActive	True/False	Is HPA actively monitoring?
ScalingLimited	True/False	At min or max replica limit?

If ScalingLimited is True, HPA cannot scale further. Consider adjusting limits or adding more nodes.

Events

Recent HPA events:

SuccessfulRescale: Scaled from X to Y replicas
FailedGetResourceMetric: Cannot fetch CPU metrics
FailedComputeMetricsReplicas: Cannot calculate desired replicas
FailedRescale: Scale operation failed

Creating an HPA

Using kubectl

Simple CPU
YAML Manifest
Memory-based
Multiple Metrics

Scale based on CPU:

kubectl autoscale deployment myapp \
  --cpu-percent=80 \
  --min=2 \
  --max=10

Full HPA spec:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15

Scale based on memory:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70

Scale based on multiple metrics:

metrics:
- type: Resource
  resource:
    name: cpu
    target:
      type: Utilization
      averageUtilization: 80
- type: Resource
  resource:
    name: memory
    target:
      type: Utilization
      averageUtilization: 70

HPA uses the metric requiring most replicas

Prerequisites

Metrics Server

Install metrics-server in your cluster:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Verify it’s running:

kubectl top nodes
kubectl top pods

Resource Requests

Pods must have CPU/memory requests defined:

resources:
  requests:
    cpu: "250m"
    memory: "256Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"

HPA cannot function without resource requests!

Target Resource

Deployment/StatefulSet must exist and be healthy:

kubectl get deployment myapp

Scaling Behavior Configuration

Control how fast HPA scales up/down:

Scale Up Behavior

behavior:
  scaleUp:
    stabilizationWindowSeconds: 0  # No delay
    policies:
    - type: Percent
      value: 100  # Double replicas
      periodSeconds: 15  # Every 15 seconds
    - type: Pods
      value: 4  # Or add 4 pods
      periodSeconds: 15
    selectPolicy: Max  # Use policy allowing most replicas

Aggressive scale up:

No stabilization window
Can double capacity quickly
Good for traffic spikes

Scale Down Behavior

behavior:
  scaleDown:
    stabilizationWindowSeconds: 300  # 5 minute window
    policies:
    - type: Percent
      value: 50  # Remove 50% of replicas
      periodSeconds: 60
    - type: Pods
      value: 2  # Or remove 2 pods
      periodSeconds: 60
    selectPolicy: Min  # Use policy removing fewest replicas

Conservative scale down:

5 minute stabilization
Gradual reduction
Prevents flapping

Best Practices

Set Reasonable Limits

Choose appropriate min/max:

minReplicas: 2  # Maintain availability
maxReplicas: 10  # Limit blast radius

Min: Ensure availability during scale down
Max: Prevent runaway scaling costs

Define Resource Requests

Always set CPU/memory requests:

resources:
  requests:
    cpu: "250m"  # 25% of a CPU core
    memory: "256Mi"

Base on actual usage, not maximums

Use Appropriate Targets

Set targets with headroom:

CPU: 70-80% (not 100%)
Memory: 70-80% (not 100%)

Allows time to scale before saturation

Configure Scale Down Delay

Prevent flapping with stabilization:

behavior:
  scaleDown:
    stabilizationWindowSeconds: 300

Wait for sustained low usage before scaling down

Monitor Scaling Events

Watch for frequent scaling:

kubectl get events --field-selector involvedObject.name=myapp-hpa

Frequent scaling indicates wrong targets

Test Scaling Behavior

Load test your application:

Generate load
Watch HPA scale up
Remove load
Watch HPA scale down
Adjust targets as needed

Troubleshooting

HPA Not Scaling

Symptom: Replicas stay constant despite high CPU Check:

Verify Metrics Server

kubectl top nodes
kubectl top pods -n <namespace>

If error: Metrics server not installed/running

Check Resource Requests

kubectl get deployment myapp -o yaml | grep -A 5 resources

Must have CPU requests defined

View HPA Status

kubectl describe hpa myapp-hpa

Look for error messages in conditions

Check Current Metrics

kubectl get hpa myapp-hpa

TARGETS column shows current vs target

Metrics Unavailable

Symptom: HPA shows “unknown” for current CPU Solutions:

Restart Metrics Server

kubectl rollout restart deployment metrics-server -n kube-system

Check Metrics Server Logs

kubectl logs -n kube-system -l k8s-app=metrics-server

Verify Kubelet Metrics

curl -k https://<node-ip>:10250/metrics

Constant Scaling (Flapping)

Symptom: HPA constantly scales up and down Causes:

Target too close to actual usage
Insufficient stabilization window
Application resource usage varies wildly

Solutions:

Increase Target Threshold

averageUtilization: 80  # Was 60, now 80

Add Stabilization Window

behavior:
  scaleDown:
    stabilizationWindowSeconds: 300

Adjust Scale Down Policy

policies:
- type: Percent
  value: 25  # Scale down slowly (was 50%)
  periodSeconds: 60

Hitting Max Replicas

Symptom: HPA at max replicas, but CPU still high Solutions:

Increase Max Replicas
```
maxReplicas: 20  # Was 10
```
Add More Nodes Scale cluster to accommodate more pods
Optimize Application Reduce CPU usage per request

Increase Pod Resources

resources:
  requests:
    cpu: "500m"  # Was 250m

Deployments

View HPA target deployments

Pods

Monitor pod resource usage

Nodes

Check cluster capacity

Events

View scaling events

Features

GitHub Integration

Configuration

​Overview

​What is HPA?

Scale Up

Scale Down

Automatic

Cost Efficient

​List View

​Features

​Table Columns

​Status Indicators

​Detail View

​Overview Section

​Scaling Metrics

​CPU Utilization

​Memory Utilization

​Custom Metrics

​Scaling Behavior

​Conditions

​Events

​Creating an HPA

​Using kubectl

​Prerequisites

​Scaling Behavior Configuration

​Scale Up Behavior

​Scale Down Behavior

​Best Practices

​Troubleshooting

​HPA Not Scaling

​Metrics Unavailable

​Constant Scaling (Flapping)

​Hitting Max Replicas

​Related Resources

Deployments

Pods

Nodes

Events

Overview

What is HPA?

List View

Features

Table Columns

Status Indicators

Detail View

Overview Section

Scaling Metrics

CPU Utilization

Memory Utilization

Custom Metrics

Scaling Behavior

Conditions

Events

Creating an HPA

Using kubectl

Prerequisites

Scaling Behavior Configuration

Scale Up Behavior

Scale Down Behavior

Best Practices

Troubleshooting

HPA Not Scaling

Metrics Unavailable

Constant Scaling (Flapping)

Hitting Max Replicas

Related Resources