Overview
Horizontal Pod Autoscaler (HPA) automatically scales the number of pods based on observed CPU utilization or custom metrics. Orphelix provides real-time monitoring of HPA status, metrics, and scaling behavior.
What is HPA?
HPA automatically adjusts the number of pod replicas in a Deployment, StatefulSet, or ReplicaSet based on resource utilization:Scale Up
Add more replicas when CPU/memory usage is high
Scale Down
Remove replicas when resource usage is low
Automatic
No manual intervention required
Cost Efficient
Pay only for resources you need
List View
Features
- Real-time Metrics: Live CPU utilization percentages
- Replica Counts: Current, min, and max replicas
- Target Status: Whether HPA is meeting targets
- Scaling Activity: Recent scale up/down events
Table Columns
| Column | Description |
|---|---|
| Name | HPA name (clickable to view details) |
| Target | Resource being scaled (Deployment/StatefulSet) |
| Min/Max | Minimum and maximum replica limits |
| Current | Current number of replicas |
| CPU | Current CPU utilization vs target (e.g., “45% / 80%“) |
| Status | Scaling status indicator |
| Age | Time since HPA creation |
Status Indicators
Active
Active
HPA is monitoring and scaling normallyIndicator: Green badgeMeans:
- Metrics are being collected
- Scaling decisions are being made
- Target resource is healthy
Unable to Scale
Unable to Scale
HPA cannot scale the targetIndicator: Red badgeCommon causes:
- Target resource doesn’t exist
- Metrics unavailable
- Insufficient permissions
- Target at max/min replicas
Scaling
Scaling
HPA is actively scaling replicasIndicator: Yellow badgeMeans:
- Scale up/down in progress
- Waiting for new pods to become ready
- Cooldown period active
Unknown
Unknown
HPA status cannot be determinedIndicator: Gray badgePossible reasons:
- Just created
- Metrics server not available
- Connection issues
Detail View
Click any HPA name to view comprehensive details:Overview Section
1
Basic Information
- Name: HPA identifier
- Namespace: Current namespace
- Target: Resource being scaled (with link)
- Status: Current HPA status
- Created: Creation timestamp
2
Replica Configuration
- Min Replicas: Minimum pod count (never scale below)
- Max Replicas: Maximum pod count (never scale above)
- Current Replicas: Current number of running pods
- Desired Replicas: Target replica count based on metrics
3
Metrics
- Target Metric: CPU or custom metric name
- Target Value: Threshold for scaling (e.g., 80% CPU)
- Current Value: Actual measured value
- Utilization %: Current as percentage of target
Scaling Metrics
CPU Utilization
Most common HPA metric:- Target Utilization
- Current Utilization
- Scaling Decision
Desired average CPU across all podsExample: 80%Calculation:
Memory Utilization
HPA can also scale based on memory:Memory-based scaling requires metrics-server to be running in the cluster
Custom Metrics
HPA supports custom metrics from:- Prometheus
- Application metrics
- Queue depth
- Request rate
Scaling Behavior
View recent scaling activity:1
Scale Up Events
When HPA added replicas:
- Timestamp of scale up
- Previous replica count
- New replica count
- Reason (CPU above target, etc.)
2
Scale Down Events
When HPA removed replicas:
- Timestamp of scale down
- Previous replica count
- New replica count
- Reason (CPU below target, etc.)
3
Cooldown Periods
HPA waits between scaling operations:
- Scale Up: 3 minutes default
- Scale Down: 5 minutes default
Conditions
HPA conditions indicate scaling health:| Condition | Status | Meaning |
|---|---|---|
| AbleToScale | True/False | Can HPA scale the target? |
| ScalingActive | True/False | Is HPA actively monitoring? |
| ScalingLimited | True/False | At min or max replica limit? |
Events
Recent HPA events:- SuccessfulRescale: Scaled from X to Y replicas
- FailedGetResourceMetric: Cannot fetch CPU metrics
- FailedComputeMetricsReplicas: Cannot calculate desired replicas
- FailedRescale: Scale operation failed
Creating an HPA
Using kubectl
- Simple CPU
- YAML Manifest
- Memory-based
- Multiple Metrics
Scale based on CPU:
Prerequisites
Metrics Server
Metrics Server
Install metrics-server in your cluster:Verify it’s running:
Resource Requests
Resource Requests
Pods must have CPU/memory requests defined:
Target Resource
Target Resource
Deployment/StatefulSet must exist and be healthy:
Scaling Behavior Configuration
Control how fast HPA scales up/down:Scale Up Behavior
- No stabilization window
- Can double capacity quickly
- Good for traffic spikes
Scale Down Behavior
- 5 minute stabilization
- Gradual reduction
- Prevents flapping
Best Practices
Set Reasonable Limits
Set Reasonable Limits
Choose appropriate min/max:
- Min: Ensure availability during scale down
- Max: Prevent runaway scaling costs
Define Resource Requests
Define Resource Requests
Always set CPU/memory requests:Base on actual usage, not maximums
Use Appropriate Targets
Use Appropriate Targets
Set targets with headroom:
- CPU: 70-80% (not 100%)
- Memory: 70-80% (not 100%)
Configure Scale Down Delay
Configure Scale Down Delay
Prevent flapping with stabilization:Wait for sustained low usage before scaling down
Monitor Scaling Events
Monitor Scaling Events
Watch for frequent scaling:Frequent scaling indicates wrong targets
Test Scaling Behavior
Test Scaling Behavior
Load test your application:
- Generate load
- Watch HPA scale up
- Remove load
- Watch HPA scale down
- Adjust targets as needed
Troubleshooting
HPA Not Scaling
Symptom: Replicas stay constant despite high CPU Check:1
Verify Metrics Server
2
Check Resource Requests
3
View HPA Status
4
Check Current Metrics
Metrics Unavailable
Symptom: HPA shows “unknown” for current CPU Solutions:-
Restart Metrics Server
-
Check Metrics Server Logs
-
Verify Kubelet Metrics
Constant Scaling (Flapping)
Symptom: HPA constantly scales up and down Causes:- Target too close to actual usage
- Insufficient stabilization window
- Application resource usage varies wildly
-
Increase Target Threshold
-
Add Stabilization Window
-
Adjust Scale Down Policy
Hitting Max Replicas
Symptom: HPA at max replicas, but CPU still high Solutions:-
Increase Max Replicas
- Add More Nodes Scale cluster to accommodate more pods
- Optimize Application Reduce CPU usage per request
-
Increase Pod Resources