Kubernetes Resource Management: Stop Guessing, Start Measuring
"Just set requests and limits" they said. "It'll be fine" they said.
The Real Problem
Most teams either:
- Under-provision (OOMKilled pods, throttled CPU)
- Over-provision (wasted money, 20% cluster utilization)
Understanding Resources
yamlresources: requests: # Scheduling guarantee memory: "256Mi" cpu: "250m" limits: # Hard ceiling memory: "512Mi" cpu: "500m"
Requests: "I need at least this much" Limits: "Never give me more than this"
The Data-Driven Approach
Step 1: Observe
promql# Actual CPU usage vs requests sum(rate(container_cpu_usage_seconds_total{namespace="production"}[5m])) / sum(kube_pod_container_resource_requests{resource="cpu",namespace="production"})
If this is consistently <50%, you're over-provisioned.
Step 2: Right-Size
Use Vertical Pod Autoscaler in recommendation mode:
yamlapiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata: name: my-app-vpa spec: targetRef: apiVersion: apps/v1 kind: Deployment name: my-app updatePolicy: updateMode: "Off" # Just recommendations
Step 3: Automate
Once confident, enable auto-updates:
yamlupdatePolicy: updateMode: "Auto" resourcePolicy: containerPolicies: - containerName: "*" minAllowed: cpu: 100m memory: 128Mi maxAllowed: cpu: 2 memory: 2Gi
Memory vs CPU: Different Strategies
Memory:
- Set limits = requests (avoid OOMKilled surprises)
- Profile your app under load
- Account for GC spikes in Java/Go
CPU:
- Limits can be higher than requests
- Burstable is often fine
- Avoid limits on latency-sensitive services
Our Numbers
After implementing this:
- Cluster utilization: 23% → 61%
- Monthly cost: -$4,200
- OOMKilled incidents: -89%
Measure, don't guess.