Azure Policy as Code: How to Govern Cloud Resources at Scale Without Losing Your Mind

If you’ve spent any time managing a non-trivial Azure environment, you’ve probably hit the same wall: things drift. Someone creates a storage account without encryption at rest. A subscription gets spun up without a cost center tag. A VM lands in a region you’re not supposed to use. Manual reviews catch some of it, but not all of it — and by the time you catch it, the problem has already been live for weeks.

Azure Policy offers a solution, but clicking through the Azure portal to define and assign policies one at a time doesn’t scale. The moment you have more than a handful of subscriptions or a team larger than one person, you need something more disciplined. That’s where Policy as Code (PaC) comes in.

This guide walks through what Policy as Code means for Azure, how to structure a working repository, the key operational decisions you’ll need to make, and how to wire it all into a CI/CD pipeline so governance is automatic — not an afterthought.


What “Policy as Code” Actually Means

The phrase sounds abstract, but the idea is simple: instead of managing your Azure Policies through the portal, you store them in a Git repository as JSON or Bicep files, version-control them like any other infrastructure code, and deploy them through an automated pipeline.

This matters for several reasons.

First, Git history becomes your audit trail. Every policy change, every exemption, every assignment — it’s all tracked with who changed it, when, and why (assuming your team writes decent commit messages). That’s something the portal can never give you.

Second, you can enforce peer review. If someone wants to create a new “allowed locations” policy or relax an existing deny effect, they open a pull request. Your team reviews it before it goes anywhere near production.

Third, you get consistency across environments. A staging environment governed by a slightly different set of policies than production is a gap waiting to become an incident. Policy as Code makes it easy to parameterize for environment differences without maintaining completely separate policy definitions.

Structuring Your Policy Repository

There’s no single right structure, but a layout that has worked well across a variety of team sizes looks something like this:

azure-policy/
  policies/
    definitions/
      storage-require-https.json
      require-resource-tags.json
      allowed-vm-skus.json
    initiatives/
      security-baseline.json
      tagging-standards.json
  assignments/
    subscription-prod.json
    subscription-dev.json
    management-group-root.json
  exemptions/
    storage-legacy-project-x.json
  scripts/
    deploy.ps1
    test.ps1
  .github/
    workflows/
      policy-deploy.yml

Policy definitions live in policies/definitions/ — these are the raw policy rule files. Initiatives (policy sets) group related definitions together in policies/initiatives/. Assignments connect initiatives or individual policies to scopes (subscriptions, management groups, resource groups) and live in assignments/. Exemptions are tracked separately so they’re visible and reviewable rather than buried in portal configuration.

Writing a Solid Policy Definition

A policy definition file is JSON with a few key sections: displayName, description, mode, parameters, and policyRule. Here’s a practical example — requiring that all storage accounts enforce HTTPS-only traffic:

{
  "displayName": "Storage accounts should require HTTPS-only traffic",
  "description": "Ensures that all Azure Storage accounts are configured with supportsHttpsTrafficOnly set to true.",
  "mode": "Indexed",
  "parameters": {
    "effect": {
      "type": "String",
      "defaultValue": "Audit",
      "allowedValues": ["Audit", "Deny", "Disabled"]
    }
  },
  "policyRule": {
    "if": {
      "allOf": [
        {
          "field": "type",
          "equals": "Microsoft.Storage/storageAccounts"
        },
        {
          "field": "Microsoft.Storage/storageAccounts/supportsHttpsTrafficOnly",
          "notEquals": true
        }
      ]
    },
    "then": {
      "effect": "[parameters('effect')]"
    }
  }
}

A few design choices worth noting. The effect is parameterized — this lets you assign the same definition with Audit in dev (to surface violations without blocking) and Deny in production (to actively block non-compliant resources). Hardcoding the effect is a common early mistake that forces you to maintain duplicate definitions for different environments.

The mode of Indexed means this policy only evaluates resource types that support tags and location. For policies targeting resource group properties or subscription-level resources, use All instead.

Grouping Policies into Initiatives

Individual policy definitions are powerful, but assigning them one at a time to every subscription is tedious and error-prone. Initiatives (also called policy sets) let you bundle related policies and assign the whole bundle at once.

A tagging standards initiative might group together policies for requiring a cost-center tag, requiring an owner tag, and inheriting tags from the resource group. An initiative like this assigns cleanly at the management group level, propagates down to all subscriptions, and can be updated in one place when your tagging requirements change.

Define your initiatives in a JSON file and reference the policy definitions by their IDs. When you deploy via the pipeline, definitions go up first, then initiatives get built from them, then assignments connect initiatives to scopes — order matters.

Testing Policies Before They Touch Production

There are two kinds of pain with policy governance: violations you catch before deployment, and violations you discover after. Policy as Code should maximize the first kind.

Linting and schema validation can run in your CI pipeline on every pull request. Tools like the Azure Policy VS Code extension or Bicep’s built-in linter catch structural errors before they ever reach Azure.

What-if analysis is available for some deployment scenarios. More practically, deploy to a dedicated governance test subscription first. Assign your policy with Audit effect, then run your compliance scripts and check the compliance report. If expected-compliant resources show as non-compliant, your policy logic has a bug.

Exemptions are another testing tool — if a specific resource legitimately needs to be excluded from a policy (legacy system, approved exception, temporary dev environment), track that exemption in your repo with a documented justification and expiry date. Exemptions that live only in the portal are invisible and tend to become permanent by accident.

Wiring Policy Deployment into CI/CD

A minimal GitHub Actions workflow for policy deployment looks something like this:

name: Deploy Azure Policies

on:
  push:
    branches: [main]
    paths:
      - 'policies/**'
      - 'assignments/**'
      - 'exemptions/**'
  pull_request:
    branches: [main]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Validate policy JSON
        run: |
          find policies/ -name '*.json' | xargs -I {} python3 -c "import json,sys; json.load(open('{}'))" && echo "All JSON valid"

  deploy:
    runs-on: ubuntu-latest
    needs: validate
    if: github.ref == 'refs/heads/main'
    steps:
      - uses: actions/checkout@v4
      - uses: azure/login@v2
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}
      - name: Deploy policy definitions
        run: ./scripts/deploy.ps1 -Stage definitions
      - name: Deploy initiatives
        run: ./scripts/deploy.ps1 -Stage initiatives
      - name: Deploy assignments
        run: ./scripts/deploy.ps1 -Stage assignments

The key pattern: pull requests trigger validation only. Merges to main trigger the actual deployment. Policy changes that bypass review by going directly to main can be prevented with branch protection rules.

For Azure DevOps shops, the same pattern applies using pipeline YAML with environment gates — require a manual approval before the assignment stage runs in production if your organization needs that extra checkpoint.

Common Pitfalls Worth Avoiding

Starting with Deny effects. The first instinct when you see a compliance gap is to block it immediately. Resist this. Start every new policy with Audit for at least two weeks. Let the compliance data show you what’s actually out of compliance before you start blocking things. Blocking before you understand the landscape leads to surprised developers and emergency exemptions.

Scope creep in initiatives. It’s tempting to build one giant “everything” initiative. Don’t. Break initiatives into logical domains — security baseline, tagging standards, allowed regions, allowed SKUs. Smaller initiatives are easier to update, easier to understand, and easier to exempt selectively when needed.

Not versioning your initiatives. When you change an initiative — adding a new policy, changing parameters — update the initiative’s display name and maintain a changelog. Initiatives that silently change are hard to reason about in compliance reports.

Forgetting inherited policies. If you’re working in a larger organization where your management group already has policies assigned from above, those assignments interact with yours. Map the existing policy landscape before you assign new policies, especially deny-effect ones, to avoid conflicts or redundant coverage.

Not cleaning up exemptions. Exemptions with no expiry date live forever. Add an expiry review process — even a simple monthly script that lists exemptions older than 90 days — and review whether they’re still justified.

Getting Started Without Boiling the Ocean

If you’re starting from scratch, a practical week-one scope is:

  1. Pick three policies you know you need: require encryption at rest on storage accounts, require tags on resource groups, deny resources in non-approved regions.
  2. Stand up a policy repo with the folder structure above.
  3. Deploy with Audit effect to a dev subscription.
  4. Fix the real violations you find rather than exempting them.
  5. Set up the CI/CD pipeline so future changes require a pull request.

That scope is small enough to finish and large enough to prove the value. From there, building out a full security baseline initiative and expanding to production becomes a natural next step rather than a daunting project.

Policy as Code isn’t glamorous, but it’s the difference between a cloud environment that drifts toward chaos and one that stays governable as it grows. The portal will always let you click things in. The question is whether anyone will know what got clicked, why, or whether it’s still correct six months later. Code and version control answer all three.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *