Kubernetes Resource Management: Stop Guessing, Start Measuring

"Just set requests and limits" they said. "It'll be fine" they said.

The Real Problem

Most teams either:

Under-provision (OOMKilled pods, throttled CPU)
Over-provision (wasted money, 20% cluster utilization)

Understanding Resources

yaml
resources:
  requests:    # Scheduling guarantee
    memory: "256Mi"
    cpu: "250m"
  limits:      # Hard ceiling
    memory: "512Mi"
    cpu: "500m"

Requests: "I need at least this much" Limits: "Never give me more than this"

The Data-Driven Approach

Step 1: Observe

promql
# Actual CPU usage vs requests
sum(rate(container_cpu_usage_seconds_total{namespace="production"}[5m])) 
/ 
sum(kube_pod_container_resource_requests{resource="cpu",namespace="production"})

If this is consistently <50%, you're over-provisioned.

Step 2: Right-Size

Use Vertical Pod Autoscaler in recommendation mode:

yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Off"  # Just recommendations

Step 3: Automate

Once confident, enable auto-updates:

yaml
updatePolicy:
  updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: "*"
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2
        memory: 2Gi

Memory vs CPU: Different Strategies

Memory:

Set limits = requests (avoid OOMKilled surprises)
Profile your app under load
Account for GC spikes in Java/Go

CPU:

Limits can be higher than requests
Burstable is often fine
Avoid limits on latency-sensitive services

Our Numbers

After implementing this:

Cluster utilization: 23% → 61%
Monthly cost: -$4,200
OOMKilled incidents: -89%

Measure, don't guess.

Kubernetes Resource Management: Stop Guessing, Start Measuring

Listen to this article

Kubernetes Resource Management: Stop Guessing, Start Measuring

The Real Problem

Understanding Resources

The Data-Driven Approach

Step 1: Observe

Step 2: Right-Size

Step 3: Automate

Memory vs CPU: Different Strategies

Our Numbers

About the Author

CloudNative

AI Debate