Context and goals

IDP Blueprint provides a compact, self-hosted platform stack for Kubernetes clusters. The architecture supports edge, on-premises, and constrained environments where horizontal scaling is limited, though the same patterns apply to larger deployments. The platform operates GitOps-first with Git as the source of truth and a reconciliation controller handling continuous synchronization. No managed control planes or commercial licenses are required, making it fully cloud-agnostic.

The platform serves three primary use cases. Engineers can evaluate a realistic platform stack on a laptop or lab cluster without cloud dependencies. Teams can prototype internal developer platforms without vendor commitments. Organizations can train platform, SRE, and security engineers on GitOps and policy-driven operations with hands-on infrastructure.

System context

A single Kubernetes cluster sits between engineers and Git. Git owns all intent, ArgoCD reconciles that intent into the cluster, and traffic flows back out through Gateway API. Platform engineers operate the stack while application teams ship workloads through it. External systems include the Git provider as source of truth, container registries for images, and optional cloud services for external secret stores. The deployment target is one cluster—either local k3d or remote—treated as interchangeable infrastructure.

System Context Diagram

Container view

:::note This diagram zooms into the IDP Blueprint system to show the major containers (applications and data stores) and their interactions. Each container is a separately deployable unit. :::

The architecture is organized into distinct logical planes, designed to separate concerns and maximize stability.

At the foundation lies the Infrastructure Core, which provides the compute control plane via the Kubernetes API and etcd. Networking and ingress are handled here by Cilium CNI and the Gateway API, ensuring secure and performant traffic management.

Above this sits the Platform Services layer, supplying essential utilities for operation. This includes HashiCorp Vault for secrets management, the External Secrets Operator for syncing credentials, and cert-manager for automated PKI. A comprehensive observability suite—comprising Prometheus, Loki, and Fluent-bit—also resides here to capture metrics and logs.

To ensure stability and compliance, the Automation and Governance layer enforces state through ArgoCD’s reconciliation loops and continuously audits configuration using Kyverno policies.

Finally, the Developer-Facing Stacks provide the actual tools for engineering teams. This includes dashboards (Grafana, Pyrra), CI/CD pipelines (Argo Workflows), code quality gates (SonarQube), and security scanners (Trivy Operator).

Components are deployed in two phases: bootstrap (one-time installation of foundational infrastructure) and continuous reconciliation (GitOps-managed application stacks).

Container View

Platform layers

The same components can be viewed as a set of logical layers:

LayerComponents (examples)Responsibility
Infrastructure coreKubernetes, Cilium, Gateway APIScheduling, networking, traffic in/out
Platform servicesVault, ESO, cert‑manager, Prometheus, Loki, Fluent‑bitSecrets, PKI, metrics, logs
Automation & governanceArgoCD, ApplicationSets, Kyverno, Policy ReporterGitOps, reconciliation, policies, compliance
Developer‑facing stacksGrafana, Pyrra, Argo Workflows, SonarQube, Trivy OperatorDashboards, pipelines, scanning

This layering is reflected in the repository layout and in the deployment order.

GitOps backbone

The control plane operates through Git-driven automation across three layers:

  1. Bootstrap: One-time installation of core infrastructure (networking, secrets management, PKI, reconciliation controller, ingress) along with foundational namespaces and RBAC.
  2. Application Stacks: Continuously reconciled state for observability, CI/CD, security, and developer tooling, organized by functional concern.
  3. Policies: Declarative governance rules and compliance configuration.

The reconciliation controller applies all changes from Git to the cluster. Manual cluster modifications are treated as drift and automatically reverted. For implementation details, see GitOps model and Application architecture.

Resilience on constrained clusters

High availability in edge and resource-constrained environments requires different strategies than large cloud deployments. The design prioritizes tiered criticality: the core control plane (Kubernetes API and etcd) receives highest priority, critical infrastructure (reconciliation controller, observability) must survive node loss, and other components can degrade or restart later.

Scheduling mechanisms enforce this prioritization:

  • PriorityClasses: Separate infrastructure from application workloads
  • Node labels: Define resource pools (control plane, infrastructure, workloads)
  • Tolerations: Allow critical components to schedule on control plane nodes during degraded conditions

When a node fails, the platform preserves visibility (metrics, logs) and maintains repair capability (GitOps reconciliation), even if some stacks run in degraded mode. See Scheduling, priority, and node pools for operational details.

Technology Selection Criteria

The architecture follows design principles tailored for constrained environments:

  • Resource efficiency: Components exhibit predictable behavior under limited resources, avoiding solutions that require extensive node pools or high baseline consumption.
  • Declarative configuration: All components support native Kubernetes CRDs and GitOps workflows, eliminating imperative configuration steps.
  • Open-source ecosystem: Technologies maintain active communities and development without commercial license requirements, ensuring accessibility for labs and internal use.
  • Cloud-agnostic design: The stack runs on bare metal, on-premises virtualization, or any public cloud provider without modification.

The blueprint focuses on platform-layer concerns (networking, security, observability, policy, GitOps). Cloud-specific managed services are explicitly excluded to maintain portability.