Think of it as an
Operating System
for your containers — it runs, heals, and scales them so you don't have to
The City Planner Analogy
Imagine you're the mayor of a city. You don't personally construct buildings, lay pipes, or direct traffic. Instead, you set policies — "we need 5 hospitals, 10 schools, and roads connecting them." City departments handle the rest. If a hospital burns down, they rebuild it automatically. If population grows, they build more schools. That's Kubernetes.
City Hall
Control Plane
Neighborhoods
Worker Nodes
Buildings
Containers
City Blocks
Pods
The Problem It Solves
# 3am. PagerDuty wakes you up.
$ ssh prod-server-12
$ docker ps | grep api
# Container crashed. Again.
$ docker run -d --restart=always \
-p 8080:8080 myapp:v2.3.1
# Wait, was it v2.3.1 or v2.3.2?
# Which servers have the new version?
$ for server in prod-{1..20}; do
ssh $server "docker ps"
done
# 4am. Still debugging. 😩 - Manual restarts when containers crash
- No easy way to scale up or down
- Version mismatches across servers
- Load balancing is your problem
- Deployments = fear and downtime
# deployment.yaml - the whole story
apiVersion: apps/v1
kind: Deployment
spec:
replicas: 5
template:
spec:
containers:
- name: api
image: myapp:v2.3.2
# kubectl apply -f deployment.yaml
# Container crashes? K8s restarts it.
# Need more? Change replicas: 20
# Go back to sleep. 😴 - Self-healing: crashed containers restart automatically
- Declarative scaling: change a number, done
- Rolling updates with zero downtime
- Built-in service discovery and load balancing
- Same config = same infrastructure, anywhere
Core Concepts
Pods
The smallest deployable unit. A pod wraps one or more containers that share networking and storage. Think of it as a "wrapper" around your container(s).
Nodes
Physical or virtual machines that run your pods. Each node has a kubelet agent that communicates with the control plane.
Clusters
A set of nodes managed together. One cluster = one control plane + multiple worker nodes. Your entire Kubernetes environment.
Services
A stable networking endpoint for accessing pods. Pods come and go, but Services provide a fixed address and load balance traffic across them.
Deployments
Manages your pods — how many replicas to run, how to roll out updates, and how to roll back if things go wrong.
Namespaces
Virtual clusters within a cluster. Isolate teams, environments (dev/staging/prod), or applications from each other.
ConfigMaps & Secrets
Decouple configuration from code. ConfigMaps hold non-sensitive data; Secrets hold passwords, tokens, and keys (base64-encoded).
Volumes
Persistent storage that outlives containers. When a pod restarts, the data survives. Supports cloud disks, NFS, and more.
Architecture
API Server
Front door for all operations
etcd
Cluster's source of truth
Scheduler
Picks the best node for pods
Controllers
Keep desired = actual state
How It Works
When you deploy an application, here's the chain of events inside the cluster:
You write a manifest
A YAML file describing your desired state — "I want 3 replicas of my API running."
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-api
spec:
replicas: 3
selector:
matchLabels:
app: my-api
template:
metadata:
labels:
app: my-api
spec:
containers:
- name: api
image: myapp:v1.0.0
ports:
- containerPort: 8080 kubectl sends it to the API Server
The API Server validates the manifest, authenticates you, and stores the desired state in etcd.
$ kubectl apply -f deployment.yaml
deployment.apps/my-api created
# Behind the scenes:
# 1. kubectl → API Server (HTTPS)
# 2. API Server validates YAML
# 3. API Server → etcd (stores desired state)
# 4. API Server confirms back to you Scheduler assigns pods to nodes
The Scheduler watches for unassigned pods and picks the best node based on available resources, constraints, and affinity rules. It considers CPU, memory, disk, and even custom rules you define.
kubelet pulls the image and starts containers
On each assigned node, the kubelet pulls the container image and tells the container runtime (containerd) to start the containers. The pod is now Running.
Controllers ensure desired = actual state
The Deployment controller continuously watches. If a pod crashes, it creates a new one. If you change replicas from 3 to 5, it spins up 2 more. This is the reconciliation loop — the heart of Kubernetes.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
my-api-7d9b4c5f6-abc12 1/1 Running 0 2m
my-api-7d9b4c5f6-def34 1/1 Running 0 2m
my-api-7d9b4c5f6-ghi56 1/1 Running 0 2m
# Kill a pod — watch K8s bring it back
$ kubectl delete pod my-api-7d9b4c5f6-abc12
pod "my-api-7d9b4c5f6-abc12" deleted
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
my-api-7d9b4c5f6-def34 1/1 Running 0 3m
my-api-7d9b4c5f6-ghi56 1/1 Running 0 3m
my-api-7d9b4c5f6-xyz99 1/1 Running 0 5s ← New pod! Code Examples
Deploy an Application
Write YAML manifests and apply them directly. The most explicit approach — you see exactly what's being created.
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
labels:
app: web-app
spec:
replicas: 3
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
containers:
- name: web
image: nginx:1.25
ports:
- containerPort: 80
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 250m
memory: 256Mi
---
# service.yaml
apiVersion: v1
kind: Service
metadata:
name: web-app
spec:
selector:
app: web-app
ports:
- port: 80
targetPort: 80
type: ClusterIP $ kubectl apply -f deployment.yaml -f service.yaml
deployment.apps/web-app created
service/web-app created Scale Your App
Scale imperatively with a command, or declaratively by editing the YAML.
# Imperative — quick and direct
$ kubectl scale deployment web-app --replicas=10
deployment.apps/web-app scaled
# Or auto-scale based on CPU
$ kubectl autoscale deployment web-app \
--min=3 --max=20 --cpu-percent=70
horizontalpodautoscaler.autoscaling/web-app autoscaled # hpa.yaml — Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 Rolling Updates
Update the image and watch Kubernetes gradually replace old pods with new ones — zero downtime.
# Update the image
$ kubectl set image deployment/web-app web=nginx:1.26
deployment.apps/web-app image updated
# Watch the rollout
$ kubectl rollout status deployment/web-app
Waiting for deployment "web-app" rollout to finish:
2 out of 3 new replicas have been updated...
3 of 3 updated replicas are available.
deployment "web-app" successfully rolled out
# Something broke? Roll back instantly
$ kubectl rollout undo deployment/web-app
deployment.apps/web-app rolled back
# Check rollout history
$ kubectl rollout history deployment/web-app # Control rollout strategy in YAML
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # Max extra pods during update
maxUnavailable: 0 # Always maintain full capacity Expose to the Internet
Use an Ingress resource to route external traffic to your service with TLS termination.
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: web-app
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- app.example.com
secretName: web-app-tls
rules:
- host: app.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: web-app
port:
number: 80 kubectl vs Helm vs Kustomize
kubectl
Direct communication with the Kubernetes API. Write plain YAML, apply it. No abstraction layer.
Helm
The "package manager" for Kubernetes. Charts bundle templates + values for reusable, versionable deployments. Huge ecosystem of pre-made charts.
Kustomize
Overlay-based configuration. Keep plain YAML (no templates), layer environment-specific patches on top. Built into kubectl since v1.14.
When to use what?
Just learning?
Start with kubectl. Understand the raw YAML before adding abstractions.
Installing software?
Use Helm. Install Prometheus, Grafana, nginx-ingress in one command.
Multi-env configs?
Use Kustomize. Same base, different overlays for dev/staging/prod.
Networking
ClusterIP (default)
Internal-only. Pods can reach each other via service name. Not accessible from outside the cluster.
Use for: internal APIs, databases, caches
NodePort
Opens a static port (30000-32767) on every node. External traffic hits NodeIP:Port.
Use for: development, bare-metal clusters
LoadBalancer
Provisions a cloud load balancer (AWS ALB/NLB, GCP LB). Gets an external IP automatically.
Use for: exposing a single service on cloud
Ingress
HTTP/HTTPS routing rules. One load balancer → multiple services via host/path rules. TLS termination.
Use for: production — the standard approach
Storage
Containers are ephemeral — when they restart, data is gone. Kubernetes Volumes solve this with a layered abstraction: admins provision storage, developers request it.
PersistentVolume (PV)
A piece of storage provisioned by an admin or dynamically. Cluster-level resource.
PersistentVolumeClaim (PVC)
A request for storage by a pod. "I need 10Gi of fast SSD storage."
StorageClass
Defines how storage is dynamically provisioned. "fast" = SSD, "standard" = HDD.
# pvc.yaml — Request storage
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: db-storage
spec:
accessModes:
- ReadWriteOnce
storageClassName: fast-ssd
resources:
requests:
storage: 10Gi
---
# Use it in a pod
spec:
containers:
- name: postgres
image: postgres:16
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
volumes:
- name: data
persistentVolumeClaim:
claimName: db-storage Production Readiness
Getting containers running is step one. Running them reliably in production requires health checks, resource limits, auto-scaling, and access control.
Health Probes
How Kubernetes knows if your app is alive, ready for traffic, and started successfully.
containers:
- name: api
image: myapp:v1.0.0
# Is the container alive? (restart if not)
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 10
periodSeconds: 15
# Is it ready for traffic? (remove from LB if not)
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
# Has it started? (for slow-starting apps)
startupProbe:
httpGet:
path: /healthz
port: 8080
failureThreshold: 30
periodSeconds: 10 livenessProbe
Container stuck? Restart it.
readinessProbe
Not ready? Stop sending traffic.
startupProbe
Still booting? Don't kill it yet.
Resource Requests & Limits
Guarantee minimum resources and cap maximum usage. Prevents noisy neighbors and OOM kills.
resources:
# Guaranteed minimum — scheduler uses this to place pods
requests:
cpu: 250m # 0.25 CPU cores
memory: 256Mi # 256 MB RAM
# Hard ceiling — container is killed if it exceeds memory limit
limits:
cpu: 500m # 0.5 CPU cores (throttled, not killed)
memory: 512Mi # 512 MB RAM (OOMKilled if exceeded) ✅ Best Practice
- Always set requests (scheduling depends on it)
- Set memory limits (prevents OOM cascades)
- CPU limits are optional (throttling vs. killing)
❌ Anti-Pattern
- No limits = one pod can starve a whole node
- Limits too low = constant OOMKills and restarts
- Requests too high = wasted cluster capacity
RBAC (Role-Based Access Control)
Control who can do what in your cluster. Essential for multi-team environments.
# role.yaml — Define permissions
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: dev
name: pod-reader
rules:
- apiGroups: [""]
resources: ["pods", "pods/log"]
verbs: ["get", "list", "watch"]
---
# rolebinding.yaml — Assign to user
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
namespace: dev
name: read-pods
subjects:
- kind: User
name: jane@example.com
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io Why Use Kubernetes?
Self-Healing
Crashed containers restart automatically. Failed nodes get their pods rescheduled. No 3am pages.
Auto-Scaling
Scale pods based on CPU, memory, or custom metrics. Scale nodes based on demand. Handle traffic spikes automatically.
Zero-Downtime Deployments
Rolling updates replace pods gradually. If a new version fails health checks, rollback happens automatically.
Service Discovery
Pods find each other by name. No hardcoded IPs. Built-in DNS resolves service names to pod endpoints.
Secrets Management
Store and inject credentials without baking them into images. Rotate secrets without redeploying code.
Run Anywhere
AWS EKS, GCP GKE, Azure AKS, bare metal, or your laptop. Same YAML works everywhere.
When to Use It
✓ Use when:
- Running microservices that need to scale independently
- You need auto-scaling for variable traffic patterns
- Multi-cloud or hybrid deployments are required
- Team needs standardized deployment workflows
- Zero-downtime deployments are a hard requirement
- You're running 10+ services in production
- You need strong isolation between teams/environments
⚠️ Skip if:
- You have a simple app that runs on one server
- Your team is small and doesn't have K8s experience
- The app is a monolith that doesn't need scaling
- You're prototyping or building an MVP
- A PaaS (Heroku, Railway, Fly.io) would suffice
- You can't dedicate time to learn and maintain it
- Your workload is serverless (Lambda/Cloud Functions)
Trade-offs
Pros
- Industry standard — massive community and ecosystem
- Cloud-agnostic — same YAML runs on AWS, GCP, Azure
- Self-healing and auto-scaling out of the box
- Declarative — describe what you want, not how to get there
- Extensible — Custom Resource Definitions (CRDs) for anything
- Battle-tested — runs at Google, Spotify, Airbnb scale
Cons
- Steep learning curve — lots of concepts to internalize
- Operational overhead — clusters need maintenance, upgrades
- Resource hungry — control plane alone needs 2+ CPU, 4GB+ RAM
- YAML fatigue — verbose configs even for simple things
- Debugging is hard — distributed systems are inherently complex
- Overkill for simple apps — sometimes docker compose is enough
Key Takeaways
Kubernetes is a container orchestrator
It doesn't run containers — it manages them. It decides where they run, restarts them when they fail, and scales them when needed.
Declarative, not imperative
You describe the desired state ("I want 5 replicas"). Kubernetes figures out how to get there and keeps it that way.
Pods are the smallest unit
You don't deploy containers directly — you deploy Pods (which wrap containers). But you usually don't create Pods directly either — you use Deployments.
Services provide stable networking
Pods are ephemeral — they get new IPs every time. Services give you a stable endpoint that load-balances across healthy pods.
Pick the right tool for the job
kubectl for learning & debugging, Helm for packaging complex apps, Kustomize for environment-specific overlays. They're not mutually exclusive.
Production needs more than just deploying
Health probes, resource limits, RBAC, and monitoring are not optional. A cluster without these is a ticking time bomb.
Start with managed Kubernetes
Don't run your own control plane. Use EKS, GKE, or AKS — they handle upgrades, etcd backups, and high availability. Focus on your apps.
It's complex, but worth it at scale
Kubernetes has a real learning curve. But once you're running 10+ services that need to scale, heal, and update independently — nothing else comes close.