The platform treats capacity as a product feature. PriorityClasses and pools ensure critical planes remain responsive even under pressure.
Mental model
Detailed Scheduling Architecture
This diagram shows how Kubernetes scheduler makes placement decisions based on PriorityClasses, node labels, taints/tolerations, and resource availability.
Scheduling Decision Flow
- Pod arrives with
priorityClassNamespecified - Scheduler evaluates:
- Priority value (higher = more important)
- Resource requests (CPU, memory)
- Node selectors and affinity rules
- Taints and tolerations
- Node selection:
- Prefers nodes matching labels
- Respects taints (unless pod has tolerations)
- Considers resource availability
- Under pressure:
- Scheduler blocks new low-priority pods
- Kubelet evicts pods starting with lowest priority
- Critical pods (Tier 1-2) remain running
Lifeboat Strategy
Critical infrastructure pods (Vault, ArgoCD, Prometheus) have tolerations for control-plane taint. This allows them to schedule on the control plane node as a “lifeboat” if all infrastructure nodes fail.
Toleration example:
tolerations: - key: node-role.kubernetes.io/control-plane operator: Exists effect: NoScheduleThis ensures observability and GitOps remain functional even if only the control plane survives.
How PriorityClasses are defined here
IT/priorityclasses/priorityclasses.yaml is the authoritative source applied by ArgoCD.
We use four intent buckets—control plane, platform services, shared UIs, and
execution/user workloads—where the relative ordering matters more than exact integer
values. Higher buckets are protected from preemption, while execution and user tiers are
evicted first under pressure. Critical pods tolerate control-plane taints to use the
control plane as a “lifeboat” if infrastructure nodes fail.
When adjusting priorities, change the manifests first; update this page only to describe the rationale and the relative ordering.
Key points:
PriorityClasses are declared in IT/priorityclasses/priorityclasses.yaml and include
tiers for infrastructure, events, policy, security, observability, CI/CD, dashboards,
and user workloads. Every chart must set a priorityClassName to fail fast if missing.
Pool separation reduces noisy neighbors, while DaemonSets like Cilium and Fluent-bit run
everywhere. High availability is a dial that can be enabled for control planes as the
platform grows.
Zero-Downtime Rolling Updates
Critical platform components use maxUnavailable: 0 in their rolling update strategy to
ensure service continuity during updates. This forces Kubernetes to create the new pod
and wait for it to be ready before terminating the old one.
Why not apply everywhere?
Requiring zero unavailability has trade-offs:
| Consideration | Impact |
|---|---|
| Resource overhead | Needs headroom for surge pods during updates |
| Scheduler complexity | Must find capacity for both old and new pods simultaneously |
| Update latency | Updates take longer since we wait for full readiness |
For non-critical components (dashboards, reports, scans), a brief interruption is acceptable and allows the scheduler to work more efficiently.
Components with Zero-Downtime Strategy
These workloads use maxUnavailable: 0 because their unavailability would block
platform operations:
| Component | Why Critical |
|---|---|
| Cilium Operator | CNI reconciliation stops → network policies don’t update |
| Vault | Secret source of truth → all secret syncs fail |
| ArgoCD (all) | GitOps engine → deployments stop, UI/API unavailable |
| Cert-Manager | Certificate automation stops → TLS renewals fail |
| External Secrets | Secret sync stops → workloads can’t get credentials |
| Kyverno Admission | Policy enforcement stops → deployments may be blocked |
| Argo Workflows Controller | Workflow execution stops → CI/CD pipelines stall |
| Argo Events Controller | Event mesh stops → no reactive automation |
Components that Accept Brief Downtime
These workloads use default rolling update strategy (typically maxUnavailable: 25%) as
a trade-off for simpler scheduling:
| Component | Rationale |
|---|---|
| Grafana | Visualization only, no data loss |
| Policy Reporter | Compliance dashboards, reports regenerate |
| Prometheus | Metrics buffered, brief gap acceptable |
| Loki | Logs buffered by Fluent-bit |
| SonarQube | Analysis can wait, stateful data persists |
| Trivy Operator | Scans can be delayed |
| Argo Workflows Server | UI only, controller keeps workflows running |
| Kyverno Background/Cleanup | Non-blocking operations |
Configuration Pattern
Zero-downtime strategy in Helm values:
deploymentStrategy: type: RollingUpdate rollingUpdate: maxUnavailable: 0 maxSurge: 1For StatefulSets (like Vault), use updateStrategyType: RollingUpdate.