Tag: policy as code

  • Azure Policy as Code: How to Govern Cloud Resources at Scale Without Losing Your Mind

    Azure Policy as Code: How to Govern Cloud Resources at Scale Without Losing Your Mind

    If you’ve spent any time managing a non-trivial Azure environment, you’ve probably hit the same wall: things drift. Someone creates a storage account without encryption at rest. A subscription gets spun up without a cost center tag. A VM lands in a region you’re not supposed to use. Manual reviews catch some of it, but not all of it — and by the time you catch it, the problem has already been live for weeks.

    Azure Policy offers a solution, but clicking through the Azure portal to define and assign policies one at a time doesn’t scale. The moment you have more than a handful of subscriptions or a team larger than one person, you need something more disciplined. That’s where Policy as Code (PaC) comes in.

    This guide walks through what Policy as Code means for Azure, how to structure a working repository, the key operational decisions you’ll need to make, and how to wire it all into a CI/CD pipeline so governance is automatic — not an afterthought.


    What “Policy as Code” Actually Means

    The phrase sounds abstract, but the idea is simple: instead of managing your Azure Policies through the portal, you store them in a Git repository as JSON or Bicep files, version-control them like any other infrastructure code, and deploy them through an automated pipeline.

    This matters for several reasons.

    First, Git history becomes your audit trail. Every policy change, every exemption, every assignment — it’s all tracked with who changed it, when, and why (assuming your team writes decent commit messages). That’s something the portal can never give you.

    Second, you can enforce peer review. If someone wants to create a new “allowed locations” policy or relax an existing deny effect, they open a pull request. Your team reviews it before it goes anywhere near production.

    Third, you get consistency across environments. A staging environment governed by a slightly different set of policies than production is a gap waiting to become an incident. Policy as Code makes it easy to parameterize for environment differences without maintaining completely separate policy definitions.

    Structuring Your Policy Repository

    There’s no single right structure, but a layout that has worked well across a variety of team sizes looks something like this:

    azure-policy/
      policies/
        definitions/
          storage-require-https.json
          require-resource-tags.json
          allowed-vm-skus.json
        initiatives/
          security-baseline.json
          tagging-standards.json
      assignments/
        subscription-prod.json
        subscription-dev.json
        management-group-root.json
      exemptions/
        storage-legacy-project-x.json
      scripts/
        deploy.ps1
        test.ps1
      .github/
        workflows/
          policy-deploy.yml

    Policy definitions live in policies/definitions/ — these are the raw policy rule files. Initiatives (policy sets) group related definitions together in policies/initiatives/. Assignments connect initiatives or individual policies to scopes (subscriptions, management groups, resource groups) and live in assignments/. Exemptions are tracked separately so they’re visible and reviewable rather than buried in portal configuration.

    Writing a Solid Policy Definition

    A policy definition file is JSON with a few key sections: displayName, description, mode, parameters, and policyRule. Here’s a practical example — requiring that all storage accounts enforce HTTPS-only traffic:

    {
      "displayName": "Storage accounts should require HTTPS-only traffic",
      "description": "Ensures that all Azure Storage accounts are configured with supportsHttpsTrafficOnly set to true.",
      "mode": "Indexed",
      "parameters": {
        "effect": {
          "type": "String",
          "defaultValue": "Audit",
          "allowedValues": ["Audit", "Deny", "Disabled"]
        }
      },
      "policyRule": {
        "if": {
          "allOf": [
            {
              "field": "type",
              "equals": "Microsoft.Storage/storageAccounts"
            },
            {
              "field": "Microsoft.Storage/storageAccounts/supportsHttpsTrafficOnly",
              "notEquals": true
            }
          ]
        },
        "then": {
          "effect": "[parameters('effect')]"
        }
      }
    }

    A few design choices worth noting. The effect is parameterized — this lets you assign the same definition with Audit in dev (to surface violations without blocking) and Deny in production (to actively block non-compliant resources). Hardcoding the effect is a common early mistake that forces you to maintain duplicate definitions for different environments.

    The mode of Indexed means this policy only evaluates resource types that support tags and location. For policies targeting resource group properties or subscription-level resources, use All instead.

    Grouping Policies into Initiatives

    Individual policy definitions are powerful, but assigning them one at a time to every subscription is tedious and error-prone. Initiatives (also called policy sets) let you bundle related policies and assign the whole bundle at once.

    A tagging standards initiative might group together policies for requiring a cost-center tag, requiring an owner tag, and inheriting tags from the resource group. An initiative like this assigns cleanly at the management group level, propagates down to all subscriptions, and can be updated in one place when your tagging requirements change.

    Define your initiatives in a JSON file and reference the policy definitions by their IDs. When you deploy via the pipeline, definitions go up first, then initiatives get built from them, then assignments connect initiatives to scopes — order matters.

    Testing Policies Before They Touch Production

    There are two kinds of pain with policy governance: violations you catch before deployment, and violations you discover after. Policy as Code should maximize the first kind.

    Linting and schema validation can run in your CI pipeline on every pull request. Tools like the Azure Policy VS Code extension or Bicep’s built-in linter catch structural errors before they ever reach Azure.

    What-if analysis is available for some deployment scenarios. More practically, deploy to a dedicated governance test subscription first. Assign your policy with Audit effect, then run your compliance scripts and check the compliance report. If expected-compliant resources show as non-compliant, your policy logic has a bug.

    Exemptions are another testing tool — if a specific resource legitimately needs to be excluded from a policy (legacy system, approved exception, temporary dev environment), track that exemption in your repo with a documented justification and expiry date. Exemptions that live only in the portal are invisible and tend to become permanent by accident.

    Wiring Policy Deployment into CI/CD

    A minimal GitHub Actions workflow for policy deployment looks something like this:

    name: Deploy Azure Policies
    
    on:
      push:
        branches: [main]
        paths:
          - 'policies/**'
          - 'assignments/**'
          - 'exemptions/**'
      pull_request:
        branches: [main]
    
    jobs:
      validate:
        runs-on: ubuntu-latest
        steps:
          - uses: actions/checkout@v4
          - name: Validate policy JSON
            run: |
              find policies/ -name '*.json' | xargs -I {} python3 -c "import json,sys; json.load(open('{}'))" && echo "All JSON valid"
    
      deploy:
        runs-on: ubuntu-latest
        needs: validate
        if: github.ref == 'refs/heads/main'
        steps:
          - uses: actions/checkout@v4
          - uses: azure/login@v2
            with:
              creds: ${{ secrets.AZURE_CREDENTIALS }}
          - name: Deploy policy definitions
            run: ./scripts/deploy.ps1 -Stage definitions
          - name: Deploy initiatives
            run: ./scripts/deploy.ps1 -Stage initiatives
          - name: Deploy assignments
            run: ./scripts/deploy.ps1 -Stage assignments

    The key pattern: pull requests trigger validation only. Merges to main trigger the actual deployment. Policy changes that bypass review by going directly to main can be prevented with branch protection rules.

    For Azure DevOps shops, the same pattern applies using pipeline YAML with environment gates — require a manual approval before the assignment stage runs in production if your organization needs that extra checkpoint.

    Common Pitfalls Worth Avoiding

    Starting with Deny effects. The first instinct when you see a compliance gap is to block it immediately. Resist this. Start every new policy with Audit for at least two weeks. Let the compliance data show you what’s actually out of compliance before you start blocking things. Blocking before you understand the landscape leads to surprised developers and emergency exemptions.

    Scope creep in initiatives. It’s tempting to build one giant “everything” initiative. Don’t. Break initiatives into logical domains — security baseline, tagging standards, allowed regions, allowed SKUs. Smaller initiatives are easier to update, easier to understand, and easier to exempt selectively when needed.

    Not versioning your initiatives. When you change an initiative — adding a new policy, changing parameters — update the initiative’s display name and maintain a changelog. Initiatives that silently change are hard to reason about in compliance reports.

    Forgetting inherited policies. If you’re working in a larger organization where your management group already has policies assigned from above, those assignments interact with yours. Map the existing policy landscape before you assign new policies, especially deny-effect ones, to avoid conflicts or redundant coverage.

    Not cleaning up exemptions. Exemptions with no expiry date live forever. Add an expiry review process — even a simple monthly script that lists exemptions older than 90 days — and review whether they’re still justified.

    Getting Started Without Boiling the Ocean

    If you’re starting from scratch, a practical week-one scope is:

    1. Pick three policies you know you need: require encryption at rest on storage accounts, require tags on resource groups, deny resources in non-approved regions.
    2. Stand up a policy repo with the folder structure above.
    3. Deploy with Audit effect to a dev subscription.
    4. Fix the real violations you find rather than exempting them.
    5. Set up the CI/CD pipeline so future changes require a pull request.

    That scope is small enough to finish and large enough to prove the value. From there, building out a full security baseline initiative and expanding to production becomes a natural next step rather than a daunting project.

    Policy as Code isn’t glamorous, but it’s the difference between a cloud environment that drifts toward chaos and one that stays governable as it grows. The portal will always let you click things in. The question is whether anyone will know what got clicked, why, or whether it’s still correct six months later. Code and version control answer all three.

  • How to Use Azure Policy Without Turning Governance Into a Developer Tax

    How to Use Azure Policy Without Turning Governance Into a Developer Tax

    Azure Policy is one of those tools that can either make a cloud estate safer and easier to manage, or make every engineering team feel like governance exists to slow them down. The difference is not the feature set. The difference is how you use it. When policy is introduced as a wall of denials with no rollout plan, teams work around it, deployments fail late, and governance earns a bad reputation. When it is used as a staged operating model, it becomes one of the most practical ways to raise standards without creating unnecessary friction.

    Start with visibility before enforcement

    The fastest way to turn Azure Policy into a developer tax is to begin with broad deny rules across subscriptions that already contain drift, exceptions, and legacy workloads. A better approach is to start with audit-focused initiatives that show what is happening today. Teams need a baseline before they can improve it. Platform owners also need evidence about where the biggest risks actually are, instead of assuming every standard should be enforced immediately.

    This visibility-first phase does two useful things. First, it surfaces repeat problems such as untagged resources, public endpoints, or unsupported SKUs. Second, it gives you concrete data for prioritization. If a rule only affects a small corner of the estate, it does not deserve the same rollout energy as a control that improves backup coverage, identity hygiene, or network exposure across dozens of workloads.

    Write policies around platform standards, not one-off preferences

    Strong governance comes from standardizing the things that should be predictable across the platform. Naming patterns, required tags, approved regions, private networking expectations, managed identity usage, and logging destinations are all good candidates because they reduce ambiguity and improve operations. Weak governance happens when policy gets used to encode every opinion an administrator has ever had. That creates clutter, exceptions, and resistance.

    If a standard matters enough to enforce, it should also exist outside the policy engine. It should be visible in landing zone documentation, infrastructure-as-code modules, architecture patterns, and deployment examples. Policy works best as the safety net behind a clear paved road. If teams can only discover a rule after a deployment fails, governance has already arrived too late.

    Use initiatives to express intent at the right level

    Individual policy definitions are useful building blocks, but initiatives are where governance starts to feel operationally coherent. Grouping related policies into initiatives makes it easier to align controls with business goals like secure networking, cost discipline, or data protection. It also simplifies assignment and reporting because stakeholders can discuss the outcome they want instead of memorizing a list of disconnected rule names.

    • A baseline initiative for core platform hygiene such as tags, approved regions, and diagnostics.
    • A security initiative for identity, network exposure, encryption, and monitoring expectations.
    • An application delivery initiative for approved service patterns, backup settings, and deployment guardrails.

    The list matters less than the structure. Teams respond better when governance feels organized and purposeful. They respond poorly when every assignment looks like a random pile of rules added over time.

    Pair deny policies with a clean exception process

    Deny policies have an important place, especially for high-risk issues that should never make it into production. But the moment you enforce them, you need a legitimate path for handling edge cases. Otherwise, engineers will treat the platform team as a ticket queue whose main job is approving bypasses. A clean exception process should define who can approve a waiver, how long it lasts, what compensating controls are expected, and how it gets reviewed later.

    This is where governance maturity shows up. Good policy programs do not pretend exceptions will disappear. They make exceptions visible, temporary, and expensive enough that teams only request them when they genuinely need them. That protects standards without ignoring real-world delivery pressure.

    Shift compliance feedback left into delivery pipelines

    Even a well-designed policy set becomes frustrating if developers only encounter it at deployment time in a shared subscription. The better pattern is to surface likely violations earlier through templates, pre-deployment validation, CI checks, and standardized modules. When teams can see policy expectations before the final deployment stage, they spend less time debugging avoidable issues and more time shipping working systems.

    In practical terms, this usually means platform teams invest in reusable Bicep or Terraform modules, example repositories, and pipeline steps that mirror the same standards enforced in Azure. Governance becomes cheaper when compliance is the default path rather than a separate clean-up exercise after a failed release.

    Measure whether policy is improving the platform

    Azure Policy should produce operational outcomes, not just dashboards full of non-compliance counts. If the program is working, you should see fewer risky configurations, faster environment provisioning, less debate about standards, and better consistency across subscriptions. Those are platform outcomes people can feel. Raw violation totals only tell part of the story, because they can rise temporarily when your visibility improves.

    A useful governance review looks at trends such as how quickly findings are remediated, which controls generate repeated exceptions, which subscriptions drift most often, and which standards are still too hard to meet through the paved road. If policy keeps finding the same issue, that is usually a platform design problem, not just a team discipline problem.

    Governance works best when it feels like product design

    The healthiest Azure environments treat governance as part of platform product design. The platform team sets standards, publishes a clear path for meeting them, watches the data, and tightens enforcement in stages. That approach respects both risk management and delivery speed. Azure Policy is powerful, but power alone is not what makes it valuable. The real value comes from using it to make the secure, supportable path the easiest path for everyone building on the platform.