Skip to main content

Overview

Horizontal Pod Autoscaler (HPA) automatically scales the number of pods based on observed CPU utilization or custom metrics. Orphelix provides real-time monitoring of HPA status, metrics, and scaling behavior. HPA List

What is HPA?

HPA automatically adjusts the number of pod replicas in a Deployment, StatefulSet, or ReplicaSet based on resource utilization:

Scale Up

Add more replicas when CPU/memory usage is high

Scale Down

Remove replicas when resource usage is low

Automatic

No manual intervention required

Cost Efficient

Pay only for resources you need

List View

Features

  • Real-time Metrics: Live CPU utilization percentages
  • Replica Counts: Current, min, and max replicas
  • Target Status: Whether HPA is meeting targets
  • Scaling Activity: Recent scale up/down events

Table Columns

ColumnDescription
NameHPA name (clickable to view details)
TargetResource being scaled (Deployment/StatefulSet)
Min/MaxMinimum and maximum replica limits
CurrentCurrent number of replicas
CPUCurrent CPU utilization vs target (e.g., “45% / 80%“)
StatusScaling status indicator
AgeTime since HPA creation

Status Indicators

HPA is monitoring and scaling normallyIndicator: Green badgeMeans:
  • Metrics are being collected
  • Scaling decisions are being made
  • Target resource is healthy
HPA cannot scale the targetIndicator: Red badgeCommon causes:
  • Target resource doesn’t exist
  • Metrics unavailable
  • Insufficient permissions
  • Target at max/min replicas
HPA is actively scaling replicasIndicator: Yellow badgeMeans:
  • Scale up/down in progress
  • Waiting for new pods to become ready
  • Cooldown period active
HPA status cannot be determinedIndicator: Gray badgePossible reasons:
  • Just created
  • Metrics server not available
  • Connection issues

Detail View

Click any HPA name to view comprehensive details:

Overview Section

1

Basic Information

  • Name: HPA identifier
  • Namespace: Current namespace
  • Target: Resource being scaled (with link)
  • Status: Current HPA status
  • Created: Creation timestamp
2

Replica Configuration

  • Min Replicas: Minimum pod count (never scale below)
  • Max Replicas: Maximum pod count (never scale above)
  • Current Replicas: Current number of running pods
  • Desired Replicas: Target replica count based on metrics
3

Metrics

  • Target Metric: CPU or custom metric name
  • Target Value: Threshold for scaling (e.g., 80% CPU)
  • Current Value: Actual measured value
  • Utilization %: Current as percentage of target

Scaling Metrics

HPA Metrics

CPU Utilization

Most common HPA metric:
Desired average CPU across all podsExample: 80%Calculation:
Average CPU across all pods should be ~80%
CPU Progress Bar: CPU Utilization Bar
Current: 45% ████████░░░░░░░░░░ Target: 80%
Status: Under target - may scale down

Memory Utilization

HPA can also scale based on memory:
metrics:
- type: Resource
  resource:
    name: memory
    target:
      type: Utilization
      averageUtilization: 70
Memory-based scaling requires metrics-server to be running in the cluster

Custom Metrics

HPA supports custom metrics from:
  • Prometheus
  • Application metrics
  • Queue depth
  • Request rate
metrics:
- type: Pods
  pods:
    metric:
      name: http_requests_per_second
    target:
      type: AverageValue
      averageValue: "1000"

Scaling Behavior

View recent scaling activity:
1

Scale Up Events

When HPA added replicas:
  • Timestamp of scale up
  • Previous replica count
  • New replica count
  • Reason (CPU above target, etc.)
2

Scale Down Events

When HPA removed replicas:
  • Timestamp of scale down
  • Previous replica count
  • New replica count
  • Reason (CPU below target, etc.)
3

Cooldown Periods

HPA waits between scaling operations:
  • Scale Up: 3 minutes default
  • Scale Down: 5 minutes default
Prevents thrashing from metric fluctuations

Conditions

HPA conditions indicate scaling health:
ConditionStatusMeaning
AbleToScaleTrue/FalseCan HPA scale the target?
ScalingActiveTrue/FalseIs HPA actively monitoring?
ScalingLimitedTrue/FalseAt min or max replica limit?
If ScalingLimited is True, HPA cannot scale further. Consider adjusting limits or adding more nodes.

Events

Recent HPA events:
  • SuccessfulRescale: Scaled from X to Y replicas
  • FailedGetResourceMetric: Cannot fetch CPU metrics
  • FailedComputeMetricsReplicas: Cannot calculate desired replicas
  • FailedRescale: Scale operation failed

Creating an HPA

Using kubectl

Scale based on CPU:
kubectl autoscale deployment myapp \
  --cpu-percent=80 \
  --min=2 \
  --max=10

Prerequisites

Install metrics-server in your cluster:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Verify it’s running:
kubectl top nodes
kubectl top pods
Pods must have CPU/memory requests defined:
resources:
  requests:
    cpu: "250m"
    memory: "256Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"
HPA cannot function without resource requests!
Deployment/StatefulSet must exist and be healthy:
kubectl get deployment myapp

Scaling Behavior Configuration

Control how fast HPA scales up/down:

Scale Up Behavior

behavior:
  scaleUp:
    stabilizationWindowSeconds: 0  # No delay
    policies:
    - type: Percent
      value: 100  # Double replicas
      periodSeconds: 15  # Every 15 seconds
    - type: Pods
      value: 4  # Or add 4 pods
      periodSeconds: 15
    selectPolicy: Max  # Use policy allowing most replicas
Aggressive scale up:
  • No stabilization window
  • Can double capacity quickly
  • Good for traffic spikes

Scale Down Behavior

behavior:
  scaleDown:
    stabilizationWindowSeconds: 300  # 5 minute window
    policies:
    - type: Percent
      value: 50  # Remove 50% of replicas
      periodSeconds: 60
    - type: Pods
      value: 2  # Or remove 2 pods
      periodSeconds: 60
    selectPolicy: Min  # Use policy removing fewest replicas
Conservative scale down:
  • 5 minute stabilization
  • Gradual reduction
  • Prevents flapping

Best Practices

Choose appropriate min/max:
minReplicas: 2  # Maintain availability
maxReplicas: 10  # Limit blast radius
  • Min: Ensure availability during scale down
  • Max: Prevent runaway scaling costs
Always set CPU/memory requests:
resources:
  requests:
    cpu: "250m"  # 25% of a CPU core
    memory: "256Mi"
Base on actual usage, not maximums
Set targets with headroom:
  • CPU: 70-80% (not 100%)
  • Memory: 70-80% (not 100%)
Allows time to scale before saturation
Prevent flapping with stabilization:
behavior:
  scaleDown:
    stabilizationWindowSeconds: 300
Wait for sustained low usage before scaling down
Watch for frequent scaling:
kubectl get events --field-selector involvedObject.name=myapp-hpa
Frequent scaling indicates wrong targets
Load test your application:
  1. Generate load
  2. Watch HPA scale up
  3. Remove load
  4. Watch HPA scale down
  5. Adjust targets as needed

Troubleshooting

HPA Not Scaling

Symptom: Replicas stay constant despite high CPU Check:
1

Verify Metrics Server

kubectl top nodes
kubectl top pods -n <namespace>
If error: Metrics server not installed/running
2

Check Resource Requests

kubectl get deployment myapp -o yaml | grep -A 5 resources
Must have CPU requests defined
3

View HPA Status

kubectl describe hpa myapp-hpa
Look for error messages in conditions
4

Check Current Metrics

kubectl get hpa myapp-hpa
TARGETS column shows current vs target

Metrics Unavailable

Symptom: HPA shows “unknown” for current CPU Solutions:
  1. Restart Metrics Server
    kubectl rollout restart deployment metrics-server -n kube-system
    
  2. Check Metrics Server Logs
    kubectl logs -n kube-system -l k8s-app=metrics-server
    
  3. Verify Kubelet Metrics
    curl -k https://<node-ip>:10250/metrics
    

Constant Scaling (Flapping)

Symptom: HPA constantly scales up and down Causes:
  • Target too close to actual usage
  • Insufficient stabilization window
  • Application resource usage varies wildly
Solutions:
  1. Increase Target Threshold
    averageUtilization: 80  # Was 60, now 80
    
  2. Add Stabilization Window
    behavior:
      scaleDown:
        stabilizationWindowSeconds: 300
    
  3. Adjust Scale Down Policy
    policies:
    - type: Percent
      value: 25  # Scale down slowly (was 50%)
      periodSeconds: 60
    

Hitting Max Replicas

Symptom: HPA at max replicas, but CPU still high Solutions:
  1. Increase Max Replicas
    maxReplicas: 20  # Was 10
    
  2. Add More Nodes Scale cluster to accommodate more pods
  3. Optimize Application Reduce CPU usage per request
  4. Increase Pod Resources
    resources:
      requests:
        cpu: "500m"  # Was 250m