Cloud Custodian, an open-source, stateless policy engine for managing public cloud environments, Kubernetes, and infrastructure as code, has reached its 10-year anniversary. Originally a cloud management tool, it is now an incubating CNCF project and is being positioned as a governance layer for agentic AI workloads.
Overview
Cloud Custodian provides a unified YAML-based DSL (domain-specific language) that lets organizations define and enforce policies for FinOps, security, and compliance across AWS, Azure, GCP, Oracle Cloud, and Kubernetes. The engine is stateless — it evaluates resources against declared rules and can take automated actions (remediation, notification, deletion) without maintaining persistent state.
What it does
Cloud Custodian's core function is declarative policy enforcement. Users write rules that describe the desired state of cloud resources; the engine then scans live environments and applies actions — such as stopping idle GPU fleets, deleting oversized storage tiers, or tagging untagged resources — to bring the environment into compliance. The project claims over 10 million weekly policy evaluations in production.
Why it matters for AI governance
With the rise of agentic AI — where autonomous agents generate and deploy infrastructure code — the speed of provisioning has outpaced human review cycles. Cloud Custodian acts as an automated safety net, enforcing organizational and industry best practices as soon as AI-generated resources are deployed. This closes cost and security risk windows that would otherwise remain open until manual review.
AI workloads introduce specific risks: GPU fleets, model serving endpoints, and training pipelines create a larger security attack surface and significantly higher cost exposure. Cloud Custodian's policies can target idle training jobs, oversized GPU instances, or misconfigured model endpoints.
Vendor neutrality and scalability
Cloud Custodian provides a single DSL that works across multiple cloud providers, preventing fragmented cost or security postures in complex multi-cloud AI workflows. The engine is designed for high-velocity environments, managing thousands of resources without the overhead of stateful management. A decade of production use has resulted in a library of thousands of community-vetted policy actions and filters.
Tradeoffs
Cloud Custodian is a policy engine, not an identity or access management system. It enforces rules on already-provisioned resources; it does not prevent provisioning at the API gateway level. Organizations using it for AI governance still need to integrate it into GitOps pipelines and CI/CD workflows to catch issues before they reach production. The YAML DSL, while powerful, requires learning a domain-specific syntax.
Bottom line
Cloud Custodian has transitioned from a cloud management tool into a cost optimization and safety layer for the AI era. Its declarative, stateless design and multi-cloud support make it a practical choice for enterprises that need automated guardrails on AI-provisioned infrastructure. The project's 10-year track record and CNCF incubation status provide a degree of reliability that newer governance tools lack.