IDP Blueprint

This document describes the security architecture and policy governance model of the IDP Blueprint platform.

Security Design Principles

The platform’s security model is built on three foundational principles:

The platform’s security model relies on defense in depth to ensure multiple layers of controls, least privilege to restrict permissions to the minimum necessary, and security as code to enforce declarative, version-controlled policies.

The CIA Triad Assessment

Security is evaluated through the CIA triad: Confidentiality, Integrity, and Availability. Each dimension has different strengths and areas for improvement in this platform.

Confidentiality

Confidentiality protects sensitive data from unauthorized access.

Current state:

Vault provides centralized secrets management with access control policies, while the External Secrets Operator synchronizes these secrets without exposing them in Git. Kubernetes RBAC restricts API access, and Cert-Manager automates TLS certificate issuance. Crucially, no sensitive data is committed to repositories; all secrets flow through Vault.

Gaps:

The Cilium NetworkPolicy engine is configured but not actively used, allowing free communication between pods. Additionally, Pod Security Standards are not enforced, permitting privileged containers if misconfigured, and Kubernetes Secrets in etcd are not currently encrypted at rest.

Integrity

Integrity ensures data and systems haven’t been tampered with.

Current state:

GitOps via ArgoCD ensures Git remains the single source of truth, creating an audit trail and automatically reverting manual changes. Kyverno validates resources against policies before admission, while immutable containers require new builds for any changes. Trivy provides integrated image scanning for vulnerabilities.

Strengths:

The combination of GitOps and policy validation creates strong integrity guarantees. Every change is tracked in Git, and Kyverno validates that resources meet defined standards.

Policy enforcement mode:

The default Helm chart validationFailureAction is set to audit to guide users without blocking deployments. Individual policies requiring strict safety, such as namespace labels, can be set to enforce. To increase strictness, update Policies/kyverno/values.yaml and document the intent in the commit.

Availability

Availability ensures systems remain accessible when needed.

Current state:

Services are categorized by tiered criticality to ensure key components survive node failures. Prometheus and Grafana provide visibility into system health, while ArgoCD continuously reconciles the desired state to repair drift.

Gaps:

Most components run as single replicas to suit the edge environment, avoiding the resource overhead of high availability but reducing redundancy. HorizontalPodAutoscalers are not configured, and losing two nodes in a three-node cluster will severely degrade functionality.

Trade-off:

The platform prioritizes resource efficiency over maximum availability. In edge environments with fixed resources, running multiple replicas of everything isn’t viable. Instead, the tiered criticality model ensures the most important components (ArgoCD, Prometheus) survive failures.

Defense in Depth Layers

The platform implements security across multiple layers:

Layer 1: Network Security

Cilium CNI provides the network layer, with capabilities for:

Cilium CNI provides the network layer, offering L3/L4 and L7 segmentation via NetworkPolicies, traffic visibility through Hubble, and a sidecar-free service mesh.

Current gap: NetworkPolicies are not implemented. This means pods can communicate freely, which is convenient but less secure. Implementing default-deny NetworkPolicies would significantly improve the confidentiality posture.

Layer 2: Identity & Access Control

Vault stores secrets with access policies.

External Secrets Operator synchronizes secrets into Kubernetes using service account authentication. Applications never directly access Vault.

Kubernetes RBAC controls who can access the Kubernetes API and what actions they can perform.

Layer 3: Admission Control

Kyverno validates, mutates, and generates resources during admission. Policies enforce:

Kyverno enforces policies during admission, requiring labels for governance, setting resource limits, and validating best practices like avoiding latest tags. Image verification capabilities exist but are not fully utilized.

Running in audit mode means violations are reported but not blocked. This is a conscious choice to reduce friction while building policy maturity. Policies can be migrated to enforce mode as the platform and its users mature.

Layer 4: Runtime Security

Trivy scans container images for vulnerabilities and misconfigurations. It can run in CI pipelines to block vulnerable images or as an operator to periodically scan running workloads.

Resource Limits: PriorityClasses and resource limits prevent resource exhaustion attacks.

Layer 5: Observability & Audit

Prometheus + Grafana provide visibility into system behavior, enabling detection of anomalies.

Loki aggregates logs, creating an audit trail of events.

PolicyReports (Kyverno): Track policy compliance over time via Policy Reporter.

Git Audit Trail: All changes flow through Git, creating a complete history.

Threat Model

Understanding what threats this architecture defends against (and what it doesn’t) is important.

Threats Addressed

Kyverno catches misconfigurations, while Vault and External Secrets prevent secret exposure. GitOps prevents unauthorized changes from persisting, Trivy detects supply chain vulnerabilities, and PriorityClasses prevent resource exhaustion.

Threats Not Fully Addressed

Lateral movement remains possible without NetworkPolicies, and privileged escalation risks exist without Pod Security Admission. Data encryption at rest is not enabled for etcd, and there is no specific DDoS protection or advanced insider threat detection.

Security Roadmap

Areas for future improvement:

Future improvements include implementing default-deny NetworkPolicies, enforcing Pod Security Standards, and enabling etcd encryption. The roadmap also covers enabling mTLS via Cilium, migrating Kyverno policies to enforce mode, requiring image signature verification, and centralizing audit logs in Loki.

References

Kyverno Component: Policy engine details
Vault Component: Secrets management
Disaster Recovery: Availability strategy
Feature Toggles: Policy mode configuration