Kubernetes workload patterns, resource management, RBAC, probes, autoscaling, ConfigMap/Secret handling, and kubectl debugging for production-grade deployments.
---
name: kubernetes-patterns
description: Kubernetes workload patterns, resource management, RBAC, probes, autoscaling, ConfigMap/Secret handling, and kubectl debugging for production-grade deployments.
origin: ECC
---
# Kubernetes Patterns
Production-grade Kubernetes patterns for deploying, managing, and debugging workloads reliably.
## When to Activate
- Writing Kubernetes manifests (Deployments, Services, Ingress, Jobs)
- Configuring resource requests/limits, liveness/readiness probes
- Setting up RBAC, namespaces, or ServiceAccounts
- Managing configuration and secrets in K8s
- Debugging CrashLoopBackOff, OOMKilled, pending pods, or image pull errors
- Configuring HPA (Horizontal Pod Autoscaler) or PodDisruptionBudgets
- Reviewing K8s YAML for security or correctness
## When to Use
> Same as **When to Activate** above. This alias satisfies repo skill-format conventions. Use this skill any time you are writing, reviewing, or debugging Kubernetes YAML and workloads.
## How It Works
This skill provides **copy-pasteable, production-grade YAML patterns** and **kubectl debugging commands** organized by task:
1. **Deployment template** — A fully configured production `Deployment` with security context, rolling update strategy, all three probe types, resource limits, and environment injection from ConfigMap/Secret.
2. **Probes** — Decision table for startup vs liveness vs readiness, with correct `failureThreshold × periodSeconds` math.
3. **Services & Ingress** — ClusterIP, LoadBalancer, and TLS Ingress patterns with cert-manager annotations.
4. **ConfigMaps & Secrets** — `envFrom`, file-mount, and external secrets guidance.
5. **Resource management** — Requests vs limits rules of thumb by workload type (web API, JVM, worker, sidecar).
6. **RBAC** — Least-privilege ServiceAccount → Role → RoleBinding chain.
7. **HPA & PDB** — Autoscaling and node-drain safety configurations.
8. **Jobs & CronJobs** — One-off and scheduled workload patterns with correct `restartPolicy`.
9. **kubectl cheatsheet** — Logs, exec, rollback, port-forward, dry-run, and common error diagnosis commands.
10. **Anti-patterns & checklist** — What NOT to do, and a security/reliability/observability checklist.
## Examples
See the sections below for complete, runnable examples. Quick references:
| Task | Jump to |
|------|---------|
| Full production Deployment YAML | [Core Workload Patterns](#core-workload-patterns) |
| Probe configuration | [Probes](#probes--liveness-readiness-startup) |
| RBAC least-privilege setup | [RBAC](#rbac--roles-and-serviceaccounts) |
| Debug a CrashLoopBackOff | [kubectl Debugging Cheatsheet](#kubectl-debugging-cheatsheet) |
| Autoscaling | [HPA](#horizontal-pod-autoscaler-hpa) |
---
## Core Workload Patterns
### Deployment — Production Template
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
namespace: my-namespace
labels:
app: my-app
version: "1.0.0"
spec:
replicas: 3
selector:
matchLabels:
app: my-app
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # Allow 1 extra pod during update
maxUnavailable: 0 # Never reduce below desired count
template:
metadata:
labels:
app: my-app
version: "1.0.0"
spec:
# Security context at pod level
securityContext:
runAsNonRoot: true
runAsUser: 1001
fsGroup: 1001
# Graceful shutdown
terminationGracePeriodSeconds: 30
containers:
- name: my-app
image: ghcr.io/org/my-app:1.0.0 # Never use :latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
protocol: TCP
# Resource requests AND limits are both required
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "256Mi"
# Container security context
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
# Probes (see Probes section below)
startupProbe:
httpGet:
path: /health
port: 8080
failureThreshold: 30
periodSeconds: 5
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 0
periodSeconds: 30
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 2
# Environment from ConfigMap and Secret
envFrom:
- configMapRef:
name: my-app-config
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: my-app-secrets
key: db-password
# Writable tmp directory when readOnlyRootFilesystem: true
volumeMounts:
- name: tmp
mountPath: /tmp
volumes:
- name: tmp
emptyDir: {}
```
---
## Probes — Liveness, Readiness, Startup
Understanding when to use each probe is critical:
| Probe | Failure Action | Use For |
|-------|---------------|---------|
| `startupProbe` | Kills container if slow to start | Slow-starting apps (JVM, Python) |
| `livenessProbe` | Restarts container | Deadlock / hung process detection |
| `readinessProbe` | Removes from Service endpoints | Temporary unavailability (DB reconnect) |
```yaml
# Correct pattern: startupProbe covers slow startup,
# then liveness/readiness take over
startupProbe:
httpGet:
path: /health
port: 8080
failureThreshold: 30 # 30 * 5s = 150s max startup time
periodSeconds: 5
livenessProbe:
httpGet:
path: /health
port: 8080
periodSeconds: 30
failureThreshold: 3 # 3 * 30s = 90s before restart
readinessProbe:
httpGet:
path: /ready # Separate endpoint: checks DB, cache, etc.
port: 8080
periodSeconds: 10
failureThreshold: 2
```
```yaml
# WRONG: initialDelaySeconds without startupProbe
# If the app takes 60s to start, set a startupProbe instead
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 60 # BAD: Arbitrary wait, race condition
```
---
## Services and Ingress
### Service Types
```yaml
# ClusterIP (default) — internal-only
apiVersion: v1
kind: Service
metadata:
name: my-app
namespace: my-namespace
spec:
selector:
app: my-app
ports:
- port: 80
targetPort: 8080
protocol: TCP
type: ClusterIP
```
```yaml
# LoadBalancer — external traffic (cloud providers)
spec:
type: LoadBalancer
ports:
- port: 443
targetPort: 8080
```
### Ingress with TLS
```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-app
namespace: my-namespace
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: "true"
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
ingressClassName: nginx
tls:
- hosts:
- myapp.example.com
secretName: my-app-tls
rules:
- host: myapp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-app
port:
number: 80
```
---
## ConfigMaps and Secrets
### ConfigMap — Non-sensitive configuration
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: my-app-config
namespace: my-namespace
data:
LOG_LEVEL: "info"
APP_ENV: "production"
MAX_CONNECTIONS: "100"
# Mount as a file for complex config
app.yaml: |
server:
port: 8080
timeout: 30s
```
```yaml
# Mount ConfigMap as a file
volumes:
- name: config
configMap:
name: my-app-config
items:
- key: app.yaml
path: app.yaml
volumeMounts:
- name: config
mountPath: /etc/app
readOnly: true
```
### Secrets — Sensitive data
```bash
# Create secret from literal (CLI, then store in Vault/SOPS)
kubectl create secret generic my-app-secrets \
--from-literal=db-password='s3cr3t' \
--namespace=my-namespace \
--dry-run=client -o yaml | kubectl apply -f -
```
```yaml
apiVersion: v1
kind: Secret
metadata:
name: my-app-secrets
namespace: my-namespace
type: Opaque
# Values are base64-encoded (NOT encrypted — use Sealed Secrets or ESO for real encryption)
data:
db-password: czNjcjN0 # base64 of 's3cr3t'
```
> **Important:** Raw Kubernetes Secrets are only base64-encoded, not encrypted at rest unless your cluster has encryption configured. Use [Sealed Secrets](https://github.com/bitnami-labs/sealed-secrets) or [External Secrets Operator](https://external-secrets.io) for production.
---
## Resource Requests and Limits
```yaml
resources:
requests: # Scheduler uses this to place the pod
cpu: "100m" # 100 millicores = 0.1 CPU
memory: "128Mi"
limits: # Container is killed/throttled above this
cpu: "500m"
memory: "256Mi"
```
**Rules of thumb:**
| Workload Type | CPU Request | Memory Request | Notes |
|---------------|-------------|----------------|-------|
| Web API | 100–250m | 128–256Mi | Set limits 2-4x requests |
| Worker/consumer | 250–500m | 256–512Mi | Memory limit = request for predictability |
| JVM app | 500m–1 | 512Mi–2Gi | Allow headroom above `-Xmx` for JVM overhead |
| Sidecar | 10–50m | 32–64Mi | Keep minimal |
```yaml
# WRONG: No requests or limits — unpredictable scheduling, OOM evictions
containers:
- name: app
image: myapp:latest
# Missing resources: {} — this is dangerous in production
# WRONG: Limits without requests — requests default to limits, over-reserves capacity
resources:
limits:
cpu: "2"
memory: "1Gi"
# requests missing — will default to limits values
```
---
## RBAC — Roles and ServiceAccounts
### Principle of Least Privilege
**Two patterns depending on whether the app calls the Kubernetes API:**
#### Pattern A — App does NOT need the Kubernetes API (most apps)
Disable token automounting on the ServiceAccount. The Role/RoleBinding are not needed.
```yaml
# ServiceAccount with token disabled — safest default
apiVersion: v1
kind: ServiceAccount
metadata:
name: my-app-sa
namespace: my-namespace
automountServiceAccountToken: false # No K8s API token injected into pods
```
```yaml
# Reference in Deployment — no token, no API access
spec:
template:
spec:
serviceAccountName: my-app-sa
automountServiceAccountToken: false # Belt-and-suspenders: also set at pod level
```
#### Pattern B — App DOES need the Kubernetes API (operators, controllers, config watchers)
Enable the token and grant only the permissions actually required.
```yaml
# 1. ServiceAccount — enable token for this SA
apiVersion: v1
kind: ServiceAccount
metadata:
name: my-app-sa
namespace: my-namespace
automountServiceAccountToken: true # Token required: app calls K8s API
```
```yaml
# 2. Role — grant only what the app needs (namespace-scoped)
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: my-app-role
namespace: my-namespace
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list", "watch"] # Read-only, specific resource
- apiGroups: [""]
resources: ["secrets"]
resourceNames: ["my-app-secrets"] # Restrict to specific secret by name
verbs: ["get"]
```
```yaml
# 3. Bind Role to ServiceAccount
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: my-app-rolebinding
namespace: my-namespace
subjects:
- kind: ServiceAccount
name: my-app-sa
namespace: my-namespace
roleRef:
kind: Role
apiGroup: rbac.authorization.k8s.io
name: my-app-role
```
```yaml
# 4. Reference SA in Deployment
spec:
template:
spec:
serviceAccountName: my-app-sa
# automountServiceAccountToken defaults to true from SA — token is injected
```
---
## Horizontal Pod Autoscaler (HPA)
```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
namespace: my-namespace
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2 # Always at least 2 for HA
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # Scale up when avg CPU > 70%
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
```
> HPA requires `resources.requests` to be set on all containers — it calculates utilization as `current / request`.
---
## PodDisruptionBudget (PDB)
Prevent too many pods going down during node drains or rolling updates:
```yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-app-pdb
namespace: my-namespace
spec:
minAvailable: 2 # OR use maxUnavailable: 1
selector:
matchLabels:
app: my-app
```
---
## Namespaces and Multi-Tenancy
```bash
# Create namespace with resource quotas
kubectl create namespace my-namespace
# Apply ResourceQuota to limit namespace consumption
kubectl apply -f - <<EOF
apiVersion: v1
kind: ResourceQuota
metadata:
name: my-namespace-quota
namespace: my-namespace
spec:
hard:
requests.cpu: "4"
requests.memory: 4Gi
limits.cpu: "8"
limits.memory: 8Gi
pods: "20"
EOF
```
---
## Jobs and CronJobs
```yaml
# One-off Job (DB migration, data processing)
apiVersion: batch/v1
kind: Job
metadata:
name: db-migrate
namespace: my-namespace
spec:
backoffLimit: 3 # Retry up to 3 times on failure
ttlSecondsAfterFinished: 3600 # Auto-delete after 1h
template:
spec:
restartPolicy: OnFailure # Never for Jobs (not Always)
containers:
- name: migrate
image: ghcr.io/org/my-app:1.0.0
command: ["python", "manage.py", "migrate"]
resources:
requests:
cpu: "100m"
memory: "256Mi"
```
```yaml
# CronJob
apiVersion: batch/v1
kind: CronJob
metadata:
name: cleanup-job
namespace: my-namespace
spec:
schedule: "0 2 * * *" # 2am daily
concurrencyPolicy: Forbid # Don't run if previous still running
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: cleanup
image: ghcr.io/org/cleanup:1.0.0
resources:
requests:
cpu: "50m"
memory: "64Mi"
```
---
## kubectl Debugging Cheatsheet
```bash
# --- Pod status and logs ---
kubectl get pods -n my-namespace
kubectl get pods -n my-namespace -o wide # Show node assignment
kubectl describe pod <pod-name> -n my-namespace # Events and state details
kubectl logs <pod-name> -n my-namespace # Current logs
kubectl logs <pod-name> -n my-namespace --previous # Logs from crashed container
kubectl logs <pod-name> -n my-namespace -c <container> # Multi-container pod
# --- Execute into a running container ---
kubectl exec -it <pod-name> -n my-namespace -- sh
kubectl exec -it <pod-name> -n my-namespace -- bash
# --- Check resource usage ---
kubectl top pods -n my-namespace
kubectl top nodes
# --- Deployment operations ---
kubectl rollout status deployment/my-app -n my-namespace
kubectl rollout history deployment/my-app -n my-namespace
kubectl rollout undo deployment/my-app -n my-namespace # Rollback
kubectl rollout undo deployment/my-app --to-revision=2 -n my-namespace
# --- Scale manually ---
kubectl scale deployment my-app --replicas=5 -n my-namespace
# --- Inspect events (cluster-wide issues) ---
kubectl get events -n my-namespace --sort-by='.lastTimestamp'
# --- Port-forward for local debugging ---
kubectl port-forward pod/<pod-name> 8080:8080 -n my-namespace
kubectl port-forward svc/my-app 8080:80 -n my-namespace
# --- Dry-run to validate YAML ---
kubectl apply -f deployment.yaml --dry-run=client
kubectl apply -f deployment.yaml --dry-run=server # Validates against live cluster
```
### Diagnosing Common Errors
```bash
# CrashLoopBackOff: container keeps crashing
kubectl logs <pod-name> --previous -n my-namespace # Check crash logs
kubectl describe pod <pod-name> -n my-namespace # Check exit code & OOMKilled
# ImagePullBackOff: can't pull image
kubectl describe pod <pod-name> -n my-namespace # Check Events section
# Causes: wrong image tag, missing imagePullSecret, private registry
# Pending pod: not scheduled
kubectl describe pod <pod-name> -n my-namespace
# Causes: insufficient resources, no matching node selector, taint/toleration mismatch
# OOMKilled: out of memory
# Increase memory limits, check for memory leaks
kubectl describe pod <pod-name> -n my-namespace | grep -A5 "Last State"
```
---
## Anti-Patterns
```yaml
# BAD: Using :latest tag — non-deterministic deployments
image: myapp:latest
# GOOD: Pin to a specific immutable tag (SHA or semver)
image: ghcr.io/org/myapp:1.4.2
# or
image: ghcr.io/org/myapp@sha256:abc123...
# ---
# BAD: Running as root
securityContext: {} # Defaults to root
# GOOD: Non-root with explicit UID
securityContext:
runAsNonRoot: true
runAsUser: 1001
# ---
# BAD: No resource limits — one pod can starve the entire node
containers:
- name: app
image: myapp:1.0.0
# No resources defined
# GOOD: Always set requests and limits
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "256Mi"
# ---
# BAD: Storing plaintext secrets in ConfigMaps
apiVersion: v1
kind: ConfigMap
data:
DB_PASSWORD: "mysecretpassword" # NEVER — use Secret or external secrets manager
# ---
# BAD: ClusterAdmin for application service accounts
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
roleRef:
kind: ClusterRole
name: cluster-admin # Grants god-mode to your app
# ---
# BAD: minAvailable: 0 in PDB — defeats the purpose
spec:
minAvailable: 0
# ---
# BAD: restartPolicy: Always in a Job (causes infinite restart loop)
spec:
restartPolicy: Always # Use OnFailure or Never for Jobs
```
---
## Best Practices Checklist
### Security
- [ ] Container runs as non-root (`runAsNonRoot: true`, `runAsUser` set)
- [ ] `readOnlyRootFilesystem: true` with `emptyDir` for writable paths
- [ ] `allowPrivilegeEscalation: false`
- [ ] All capabilities dropped (`capabilities.drop: [ALL]`)
- [ ] Dedicated ServiceAccount per app, not `default`
- [ ] `automountServiceAccountToken: false` unless needed
- [ ] RBAC follows least privilege (use `Role`, not `ClusterRole` unless needed)
- [ ] Secrets managed via Sealed Secrets or External Secrets Operator
### Reliability
- [ ] All 3 probe types configured (startup + liveness + readiness)
- [ ] Resource requests AND limits set on every container
- [ ] `minReplicas: 2+` for any production workload
- [ ] PodDisruptionBudget defined for stateful or critical services
- [ ] `RollingUpdate` strategy with `maxUnavailable: 0`
- [ ] HPA configured for variable-load services
### Observability
- [ ] App exposes `/health` (liveness) and `/ready` (readiness) endpoints
- [ ] Structured JSON logging (no PII in logs)
- [ ] Resource labels: `app`, `version`, `environment`
---
## Related Skills
- `docker-patterns` — Multi-stage Dockerfiles and image security
- `deployment-patterns` — CI/CD pipelines, rollback strategy, health check endpoints
- `security-review` — Broader security hardening context
- `git-workflow` — GitOps integration with K8s (ArgoCD / Flux patterns)
Creator's repository · affaan-m/everything-claude-code